分类目录归档:知识问答

在matplotlib中设置y轴限制

问题:在matplotlib中设置y轴限制

我需要在matplotlib上设置y轴限制的帮助。这是我尝试失败的代码。

import matplotlib.pyplot as plt

plt.figure(1, figsize = (8.5,11))
plt.suptitle('plot title')
ax = []
aPlot = plt.subplot(321, axisbg = 'w', title = "Year 1")
ax.append(aPlot)
plt.plot(paramValues,plotDataPrice[0], color = '#340B8C', 
     marker = 'o', ms = 5, mfc = '#EB1717')
plt.xticks(paramValues)
plt.ylabel('Average Price')
plt.xlabel('Mark-up')
plt.grid(True)
plt.ylim((25,250))

使用此图的数据,我得到的Y轴限制为20和200。但是,我希望限制为20和250。

I need help with setting the limits of y-axis on matplotlib. Here is the code that I tried, unsuccessfully.

import matplotlib.pyplot as plt

plt.figure(1, figsize = (8.5,11))
plt.suptitle('plot title')
ax = []
aPlot = plt.subplot(321, axisbg = 'w', title = "Year 1")
ax.append(aPlot)
plt.plot(paramValues,plotDataPrice[0], color = '#340B8C', 
     marker = 'o', ms = 5, mfc = '#EB1717')
plt.xticks(paramValues)
plt.ylabel('Average Price')
plt.xlabel('Mark-up')
plt.grid(True)
plt.ylim((25,250))

With the data I have for this plot, I get y-axis limits of 20 and 200. However, I want the limits 20 and 250.


回答 0

尝试这个 。也适用于子图。

axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])

Try this . Works for subplots too .

axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])

回答 1

您的代码也对我有用。但是,另一种解决方法是获取图的轴,然后仅更改y值:

x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,25,250))

Your code works also for me. However, another workaround can be to get the plot’s axis and then change only the y-values:

x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,25,250))


回答 2

您可以做的一件事是使用matplotlib.pyplot.axis自行设置轴范围。

matplotlib.pyplot.axis

from matplotlib import pyplot as plt
plt.axis([0, 10, 0, 20])

0,10用于x轴范围。0,20是y轴范围。

或者您也可以使用matplotlib.pyplot.xlim或matplotlib.pyplot.ylim

matplotlib.pyplot.ylim

plt.ylim(-2, 2)
plt.xlim(0,10)

One thing you can do is to set your axis range by yourself by using matplotlib.pyplot.axis.

matplotlib.pyplot.axis

from matplotlib import pyplot as plt
plt.axis([0, 10, 0, 20])

0,10 is for x axis range. 0,20 is for y axis range.

or you can also use matplotlib.pyplot.xlim or matplotlib.pyplot.ylim

matplotlib.pyplot.ylim

plt.ylim(-2, 2)
plt.xlim(0,10)

回答 3

您可以从中实例化对象matplotlib.pyplot.axes并对其进行调用set_ylim()。就像这样:

import matplotlib.pyplot as plt
axes = plt.axes()
axes.set_ylim([0, 1])

You can instantiate an object from matplotlib.pyplot.axes and call the set_ylim() on it. It would be something like this:

import matplotlib.pyplot as plt
axes = plt.axes()
axes.set_ylim([0, 1])

回答 4

这至少在matplotlib 2.2.2版中有效:

plt.axis([None, None, 0, 100])

大概这是设置例如xmin和ymax等的好方法。

This worked at least in matplotlib version 2.2.2:

plt.axis([None, None, 0, 100])

Probably this is a nice way to set up for example xmin and ymax only, etc.


回答 5

要添加到@Hima的答案中,如果要修改当前的x或y限制,可以使用以下内容。

import numpy as np # you probably alredy do this so no extra overhead
fig, axes = plt.subplot()
axes.plot(data[:,0], data[:,1])
xlim = axes.get_xlim()
# example of how to zoomout by a factor of 0.1
factor = 0.1 
new_xlim = (xlim[0] + xlim[1])/2 + np.array((-0.5, 0.5)) * (xlim[1] - xlim[0]) * (1 + factor) 
axes.set_xlim(new_xlim)

当我想从默认绘图设置中缩小或放大一点时,我发现这特别有用。

To add to @Hima’s answer, if you want to modify a current x or y limit you could use the following.

import numpy as np # you probably alredy do this so no extra overhead
fig, axes = plt.subplot()
axes.plot(data[:,0], data[:,1])
xlim = axes.get_xlim()
# example of how to zoomout by a factor of 0.1
factor = 0.1 
new_xlim = (xlim[0] + xlim[1])/2 + np.array((-0.5, 0.5)) * (xlim[1] - xlim[0]) * (1 + factor) 
axes.set_xlim(new_xlim)

I find this particularly useful when I want to zoom out or zoom in just a little from the default plot settings.


回答 6

这应该工作。您的代码对我有效,例如Tamás和Manoj Govindan。看来您可以尝试更新Matplotlib。如果您无法更新Matplotlib(例如,如果您的管理权限不足),也许使用其他后端matplotlib.use()可能会有所帮助。

This should work. Your code works for me, like for Tamás and Manoj Govindan. It looks like you could try to update Matplotlib. If you can’t update Matplotlib (for instance if you have insufficient administrative rights), maybe using a different backend with matplotlib.use() could help.


回答 7

仅用于微调。如果只想设置轴的一个边界,而另一个边界不变,则可以选择以下一个或多个语句

plt.xlim(right=xmax) #xmax is your value
plt.xlim(left=xmin) #xmin is your value
plt.ylim(top=ymax) #ymax is your value
plt.ylim(bottom=ymin) #ymin is your value

查看有关xlimylim的文档

Just for fine tuning. If you want to set only one of the boundaries of the axis and let the other boundary unchanged, you can choose one or more of the following statements

plt.xlim(right=xmax) #xmax is your value
plt.xlim(left=xmin) #xmin is your value
plt.ylim(top=ymax) #ymax is your value
plt.ylim(bottom=ymin) #ymin is your value

Take a look at the documentation for xlim and for ylim


回答 8

如果某个轴(由问题下方代码下方的代码生成)与第一个轴共享范围,请确保将范围设置为该轴的最后一个绘图之后

If an axes (generated by code below the code shown in the question) is sharing the range with the first axes, make sure that you set the range after the last plot of that axes.


将字符列表转换为字符串

问题:将字符列表转换为字符串

如果我有一个字符列表:

a = ['a','b','c','d']

如何将其转换为单个字符串?

a = 'abcd'

If I have a list of chars:

a = ['a','b','c','d']

How do I convert it into a single string?

a = 'abcd'

回答 0

使用join空字符串的方法将所有字符串以及中间的空字符串连接在一起,如下所示:

>>> a = ['a', 'b', 'c', 'd']
>>> ''.join(a)
'abcd'

Use the join method of the empty string to join all of the strings together with the empty string in between, like so:

>>> a = ['a', 'b', 'c', 'd']
>>> ''.join(a)
'abcd'

回答 1

这可以在许多流行的语言(例如JavaScript和Ruby)中使用,为什么不能在Python中使用?

>>> ['a', 'b', 'c'].join('')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

奇怪的是,在Python中,join方法在str类上:

# this is the Python way
"".join(['a','b','c','d'])

为什么对象中join的方法list不像JavaScript或其他流行的脚本语言那样?这是Python社区如何思考的一个示例。由于join返回的是字符串,因此应将其放置在字符串类中,而不是列表类中,因此该str.join(list)方法意味着:使用str分隔符将列表连接到新字符串中(本例中str为空字符串)。

过了一段时间,我莫名其妙地爱上了这种思维方式。我可以抱怨Python设计中的很多事情,但不能抱怨它的连贯性。

This works in many popular languages like JavaScript and Ruby, why not in Python?

>>> ['a', 'b', 'c'].join('')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

Strange enough, in Python the join method is on the str class:

# this is the Python way
"".join(['a','b','c','d'])

Why join is not a method in the list object like in JavaScript or other popular script languages? It is one example of how the Python community thinks. Since join is returning a string, it should be placed in the string class, not on the list class, so the str.join(list) method means: join the list into a new string using str as a separator (in this case str is an empty string).

Somehow I got to love this way of thinking after a while. I can complain about a lot of things in Python design, but not about its coherence.


回答 2

如果您的Python解释器较旧(例如,1.5.2在某些较旧的Linux发行版中很常见),则您可能无法join()在任何旧的字符串对象上将其用作方法,而需要使用字符串模块。例:

a = ['a', 'b', 'c', 'd']

try:
    b = ''.join(a)

except AttributeError:
    import string
    b = string.join(a, '')

字符串b将为'abcd'

If your Python interpreter is old (1.5.2, for example, which is common on some older Linux distributions), you may not have join() available as a method on any old string object, and you will instead need to use the string module. Example:

a = ['a', 'b', 'c', 'd']

try:
    b = ''.join(a)

except AttributeError:
    import string
    b = string.join(a, '')

The string b will be 'abcd'.


回答 3

这可能是最快的方法:

>> from array import array
>> a = ['a','b','c','d']
>> array('B', map(ord,a)).tostring()
'abcd'

This may be the fastest way:

>> from array import array
>> a = ['a','b','c','d']
>> array('B', map(ord,a)).tostring()
'abcd'

回答 4

减少功能也起作用

import operator
h=['a','b','c','d']
reduce(operator.add, h)
'abcd'

The reduce function also works

import operator
h=['a','b','c','d']
reduce(operator.add, h)
'abcd'

回答 5

如果列表包含数字,则可以map()与结合使用join()

例如:

>>> arr = [3, 30, 34, 5, 9]
>>> ''.join(map(str, arr))
3303459

If the list contains numbers, you can use map() with join().

Eg:

>>> arr = [3, 30, 34, 5, 9]
>>> ''.join(map(str, arr))
3303459

回答 6

h = ['a','b','c','d','e','f']
g = ''
for f in h:
    g = g + f

>>> g
'abcdef'
h = ['a','b','c','d','e','f']
g = ''
for f in h:
    g = g + f

>>> g
'abcdef'

回答 7

除了str.join这是最自然的方式,一种可能性是使用io.StringIO和滥用一次writelines编写所有元素:

import io

a = ['a','b','c','d']

out = io.StringIO()
out.writelines(a)
print(out.getvalue())

印刷品:

abcd

当将此方法与生成器函数或不是a tuple或a 的可迭代器一起使用时list,它将保存临时创建的列表,该列表join确实可以一次性分配正确的大小(并且1个字符的字符串列表在内存方面非常昂贵) )。

如果您的内存不足,并且输入的对象是惰性求值,则此方法是最佳解决方案。

besides str.join which is the most natural way, a possibility is to use io.StringIO and abusing writelines to write all elements in one go:

import io

a = ['a','b','c','d']

out = io.StringIO()
out.writelines(a)
print(out.getvalue())

prints:

abcd

When using this approach with a generator function or an iterable which isn’t a tuple or a list, it saves the temporary list creation that join does to allocate the right size in one go (and a list of 1-character strings is very expensive memory-wise).

If you’re low in memory and you have a lazily-evaluated object as input, this approach is the best solution.


回答 8

您也可以operator.concat()这样使用:

>>> from operator import concat
>>> a = ['a', 'b', 'c', 'd']
>>> reduce(concat, a)
'abcd'

如果您使用的是Python 3,则需要先添加:

>>> from functools import reduce

由于内置函数reduce()已从Python 3中删除,现在位于中functools.reduce()

You could also use operator.concat() like this:

>>> from operator import concat
>>> a = ['a', 'b', 'c', 'd']
>>> reduce(concat, a)
'abcd'

If you’re using Python 3 you need to prepend:

>>> from functools import reduce

since the builtin reduce() has been removed from Python 3 and now lives in functools.reduce().


使用Python将JSON字符串转换为dict

问题:使用Python将JSON字符串转换为dict

我对Python中的JSON感到有些困惑。在我看来,这就像是一本字典,因此我正在尝试这样做:

{
    "glossary":
    {
        "title": "example glossary",
        "GlossDiv":
        {
            "title": "S",
            "GlossList":
            {
                "GlossEntry":
                {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef":
                    {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

但是当我这样做时print dict(json),它会给出一个错误。

如何将该字符串转换为结构,然后调用json["title"]以获得“示例词汇表”?

I’m a little bit confused with JSON in Python. To me, it seems like a dictionary, and for that reason I’m trying to do that:

{
    "glossary":
    {
        "title": "example glossary",
        "GlossDiv":
        {
            "title": "S",
            "GlossList":
            {
                "GlossEntry":
                {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef":
                    {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

But when I do print dict(json), it gives an error.

How can I transform this string into a structure and then call json["title"] to obtain “example glossary”?


回答 0

json.loads()

import json

d = json.loads(j)
print d['glossary']['title']

json.loads()

import json

d = json.loads(j)
print d['glossary']['title']

回答 1

当我开始使用json时,我很困惑,无法解决一段时间,但最终我得到了想要的东西。
这是简单的解决方案

import json
m = {'id': 2, 'name': 'hussain'}
n = json.dumps(m)
o = json.loads(n)
print(o['id'], o['name'])

When I started using json, I was confused and unable to figure it out for some time, but finally I got what I wanted
Here is the simple solution

import json
m = {'id': 2, 'name': 'hussain'}
n = json.dumps(m)
o = json.loads(n)
print(o['id'], o['name'])

回答 2

使用simplejson或cjson进行加速

import simplejson as json

json.loads(obj)

or 

cjson.decode(obj)

use simplejson or cjson for speedups

import simplejson as json

json.loads(obj)

or 

cjson.decode(obj)

回答 3

如果您信任数据源,则可以用于eval将字符串转换为字典:

eval(your_json_format_string)

例:

>>> x = "{'a' : 1, 'b' : True, 'c' : 'C'}"
>>> y = eval(x)

>>> print x
{'a' : 1, 'b' : True, 'c' : 'C'}
>>> print y
{'a': 1, 'c': 'C', 'b': True}

>>> print type(x), type(y)
<type 'str'> <type 'dict'>

>>> print y['a'], type(y['a'])
1 <type 'int'>

>>> print y['a'], type(y['b'])
1 <type 'bool'>

>>> print y['a'], type(y['c'])
1 <type 'str'>

If you trust the data source, you can use eval to convert your string into a dictionary:

eval(your_json_format_string)

Example:

>>> x = "{'a' : 1, 'b' : True, 'c' : 'C'}"
>>> y = eval(x)

>>> print x
{'a' : 1, 'b' : True, 'c' : 'C'}
>>> print y
{'a': 1, 'c': 'C', 'b': True}

>>> print type(x), type(y)
<type 'str'> <type 'dict'>

>>> print y['a'], type(y['a'])
1 <type 'int'>

>>> print y['a'], type(y['b'])
1 <type 'bool'>

>>> print y['a'], type(y['c'])
1 <type 'str'>

如何获得浮动范围之间的随机数?

问题:如何获得浮动范围之间的随机数?

randrange(start, stop)只接受整数参数。那么,如何在两个浮点值之间获得一个随机数呢?

randrange(start, stop) only takes integer arguments. So how would I get a random number between two float values?


回答 0

使用random.uniform(a,b)

>>> random.uniform(1.5, 1.9)
1.8733202628557872

Use random.uniform(a, b):

>>> random.uniform(1.5, 1.9)
1.8733202628557872

回答 1

random.uniform(a, b)似乎是您要寻找的。从文档:

返回一个随机浮点数N,使得a <= N <= b表示a <= b,b <= N <= a表示b <a。

这里

random.uniform(a, b) appears to be what your looking for. From the docs:

Return a random floating point number N such that a <= N <= b for a <= b and b <= N <= a for b < a.

See here.


回答 2

如果您想生成一个随机浮点数,该浮点数的右边是N个数字,则可以执行以下操作:

round(random.uniform(1,2), N)

第二个参数是小数位数。

if you want generate a random float with N digits to the right of point, you can make this :

round(random.uniform(1,2), N)

the second argument is the number of decimals.


回答 3

最常见的是,您将使用:

import random
random.uniform(a, b) # range [a, b) or [a, b] depending on floating-point rounding

如果需要,Python可提供其他发行版

如果已经numpy导入,则可以使用其等效项:

import numpy as np
np.random.uniform(a, b) # range [a, b)

同样,如果需要其他发行版,请numpy提供与python相同的发行版,以及许多其他发行版

Most commonly, you’d use:

import random
random.uniform(a, b) # range [a, b) or [a, b] depending on floating-point rounding

Python provides other distributions if you need.

If you have numpy imported already, you can used its equivalent:

import numpy as np
np.random.uniform(a, b) # range [a, b)

Again, if you need another distribution, numpy provides the same distributions as python, as well as many additional ones.


如何将CSV数据读入NumPy中的记录数组?

问题:如何将CSV数据读入NumPy中的记录数组?

我不知道是否有一个CSV文件的内容导入到一个记录阵列直接的方式,很多的方式是R的read.table()read.delim()read.csv()家庭的进口数据与R的数据帧?

还是使用csv.reader()然后应用类似内容的最佳方法numpy.core.records.fromrecords()

I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R’s read.table(), read.delim(), and read.csv() family imports data to R’s data frame?

Or is the best way to use csv.reader() and then apply something like numpy.core.records.fromrecords()?


回答 0

您可以genfromtxt()通过将delimiterkwarg 设置为逗号来使用Numpy的方法。

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')

有关该功能的更多信息,请参见其相应的文档

You can use Numpy’s genfromtxt() method to do so, by setting the delimiter kwarg to a comma.

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.


回答 1

我会read_csvpandas库中推荐该功能:

import pandas as pd
df=pd.read_csv('myfile.csv', sep=',',header=None)
df.values
array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

这提供了一个熊猫DataFrame-允许许多有用的数据操作功能,而numpy记录数组无法直接使用这些功能

DataFrame是二维标记的数据结构,具有可能不同类型的列。您可以将其视为电子表格或SQL表…


我也建议genfromtxt。但是,由于该问题要求记录数组,而不是普通数组,因此dtype=None需要将参数添加到genfromtxt调用中:

给定一个输入文件,myfile.csv

1.0, 2, 3
4, 5.5, 6

import numpy as np
np.genfromtxt('myfile.csv',delimiter=',')

给出一个数组:

array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

np.genfromtxt('myfile.csv',delimiter=',',dtype=None)

给出一个记录数组:

array([(1.0, 2.0, 3), (4.0, 5.5, 6)], 
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

这样的优点是可以轻松导入具有多种数据类型(包括字符串)的文件。

I would recommend the read_csv function from the pandas library:

import pandas as pd
df=pd.read_csv('myfile.csv', sep=',',header=None)
df.values
array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

This gives a pandas DataFrame – allowing many useful data manipulation functions which are not directly available with numpy record arrays.

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table…


I would also recommend genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the dtype=None parameter needs to be added to the genfromtxt call:

Given an input file, myfile.csv:

1.0, 2, 3
4, 5.5, 6

import numpy as np
np.genfromtxt('myfile.csv',delimiter=',')

gives an array:

array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

and

np.genfromtxt('myfile.csv',delimiter=',',dtype=None)

gives a record array:

array([(1.0, 2.0, 3), (4.0, 5.5, 6)], 
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

This has the advantage that file with multiple data types (including strings) can be easily imported.


回答 2

我定时了

from numpy import genfromtxt
genfromtxt(fname = dest_file, dtype = (<whatever options>))

import csv
import numpy as np
with open(dest_file,'r') as dest_f:
    data_iter = csv.reader(dest_f,
                           delimiter = delimiter,
                           quotechar = '"')
    data = [data for data in data_iter]
data_array = np.asarray(data, dtype = <whatever options>)

在460万行,约70列的数据上,发现NumPy路径花费了2分16秒,而csv-list理解方法花费了13秒。

我建议使用csv-list理解方法,因为它很可能依赖于预编译的库,而不像NumPy那样依赖于解释器。我怀疑pandas方法会有类似的解释器开销。

I timed the

from numpy import genfromtxt
genfromtxt(fname = dest_file, dtype = (<whatever options>))

versus

import csv
import numpy as np
with open(dest_file,'r') as dest_f:
    data_iter = csv.reader(dest_f,
                           delimiter = delimiter,
                           quotechar = '"')
    data = [data for data in data_iter]
data_array = np.asarray(data, dtype = <whatever options>)

on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.

I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.


回答 3

您也可以尝试使用recfromcsv()哪种方法可以猜测数据类型并返回格式正确的记录数组。

You can also try recfromcsv() which can guess data types and return a properly formatted record array.


回答 4

当我尝试使用NumPy和Pandas两种方式时,使用Pandas有很多优点:

  • 快点
  • 减少CPU使用率
  • 与NumPy genfromtxt相比1/3的RAM使用量

这是我的测试代码:

$ for f in test_pandas.py test_numpy_csv.py ; do  /usr/bin/time python $f; done
2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
0inputs+24outputs (0major+107147minor)pagefaults 0swaps

23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
0inputs+0outputs (0major+416145minor)pagefaults 0swaps

test_numpy_csv.py

from numpy import genfromtxt
train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')

test_pandas.py

from pandas import read_csv
df = read_csv('/home/hvn/me/notebook/train.csv')

资料档案:

du -h ~/me/notebook/train.csv
 59M    /home/hvn/me/notebook/train.csv

使用NumPy和pandas版本:

$ pip freeze | egrep -i 'pandas|numpy'
numpy==1.13.3
pandas==0.20.2

As I tried both ways using NumPy and Pandas, using pandas has a lot of advantages:

  • Faster
  • Less CPU usage
  • 1/3 RAM usage compared to NumPy genfromtxt

This is my test code:

$ for f in test_pandas.py test_numpy_csv.py ; do  /usr/bin/time python $f; done
2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
0inputs+24outputs (0major+107147minor)pagefaults 0swaps

23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
0inputs+0outputs (0major+416145minor)pagefaults 0swaps

test_numpy_csv.py

from numpy import genfromtxt
train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')

test_pandas.py

from pandas import read_csv
df = read_csv('/home/hvn/me/notebook/train.csv')

Data file:

du -h ~/me/notebook/train.csv
 59M    /home/hvn/me/notebook/train.csv

With NumPy and pandas at versions:

$ pip freeze | egrep -i 'pandas|numpy'
numpy==1.13.3
pandas==0.20.2

回答 5

您可以使用以下代码将CSV文件数据发送到数组中:

import numpy as np
csv = np.genfromtxt('test.csv', delimiter=",")
print(csv)

You can use this code to send CSV file data into an array:

import numpy as np
csv = np.genfromtxt('test.csv', delimiter=",")
print(csv)

回答 6

使用 numpy.loadtxt

一个非常简单的方法。但这要求所有元素都是浮点数(int等)

import numpy as np 
data = np.loadtxt('c:\\1.csv',delimiter=',',skiprows=0)  

Using numpy.loadtxt

A quite simple method. But it requires all the elements being float (int and so on)

import numpy as np 
data = np.loadtxt('c:\\1.csv',delimiter=',',skiprows=0)  

回答 7

这是最简单的方法:

import csv with open('testfile.csv', newline='') as csvfile: data = list(csv.reader(csvfile))

现在,数据中的每个条目都是一条记录,表示为一个数组。因此,您拥有一个2D阵列。它节省了我很多时间。

This is the easiest way:

import csv with open('testfile.csv', newline='') as csvfile: data = list(csv.reader(csvfile))

Now each entry in data is a record, represented as an array. So you have a 2D array. It saved me so much time.


回答 8

我尝试了这个:

import pandas as p
import numpy as n

closingValue = p.read_csv("<FILENAME>", usecols=[4], dtype=float)
print(closingValue)

I tried this:

import pandas as p
import numpy as n

closingValue = p.read_csv("<FILENAME>", usecols=[4], dtype=float)
print(closingValue)

回答 9

我建议使用表格(pip3 install tables)。您可以将.csv文件保存为.h5使用熊猫(pip3 install pandas),

import pandas as pd
data = pd.read_csv("dataset.csv")
store = pd.HDFStore('dataset.h5')
store['mydata'] = data
store.close()

然后,您可以轻松地以较少的时间(即使是处理大量数据)将数据加载到NumPy数组中

import pandas as pd
store = pd.HDFStore('dataset.h5')
data = store['mydata']
store.close()

# Data in NumPy format
data = data.values

I would suggest using tables (pip3 install tables). You can save your .csv file to .h5 using pandas (pip3 install pandas),

import pandas as pd
data = pd.read_csv("dataset.csv")
store = pd.HDFStore('dataset.h5')
store['mydata'] = data
store.close()

You can then easily, and with less time even for huge amount of data, load your data in a NumPy array.

import pandas as pd
store = pd.HDFStore('dataset.h5')
data = store['mydata']
store.close()

# Data in NumPy format
data = data.values

回答 10

这项工作令人着迷…

import csv
with open("data.csv", 'r') as f:
    data = list(csv.reader(f, delimiter=";"))

import numpy as np
data = np.array(data, dtype=np.float)

This work as a charm…

import csv
with open("data.csv", 'r') as f:
    data = list(csv.reader(f, delimiter=";"))

import numpy as np
data = np.array(data, dtype=np.float)

使用Python在Pandas中读取CSV文件时出现UnicodeDecodeError

问题:使用Python在Pandas中读取CSV文件时出现UnicodeDecodeError

我正在运行一个程序,正在处理30,000个类似文件。他们中有随机数正在停止并产生此错误…

   File "C:\Importer\src\dfman\importer.py", line 26, in import_chr
     data = pd.read_csv(filepath, names=fields)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
     return _read(filepath_or_buffer, kwds)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
     return parser.read()
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
     ret = self._engine.read(nrows)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
     data = self._reader.read(nrows)
   File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
   File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
   File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
   File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
   File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
   File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
   File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
   File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 6: invalid    continuation byte

这些文件的源/创建都来自同一位置。纠正此错误以继续导入的最佳方法是什么?

I’m running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error…

   File "C:\Importer\src\dfman\importer.py", line 26, in import_chr
     data = pd.read_csv(filepath, names=fields)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
     return _read(filepath_or_buffer, kwds)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
     return parser.read()
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
     ret = self._engine.read(nrows)
   File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
     data = self._reader.read(nrows)
   File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
   File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
   File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
   File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
   File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
   File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
   File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
   File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 6: invalid    continuation byte

The source/creation of these files all come from the same place. What’s the best way to correct this to proceed with the import?


回答 0

read_csv可以encoding选择处理不同格式的文件。我主要使用read_csv('file', encoding = "ISO-8859-1"),或者替代地encoding = "utf-8"阅读,并且通常utf-8用于to_csv

您还可以使用而不是的多个alias选项'latin'之一'ISO-8859-1'(请参阅python docs,还可能会遇到许多其他编码)。

请参阅相关的Pandas文档有关csv文件的python文档示例以及有关SO的大量相关问题。一个好的背景资源是每个开发人员应了解的unicode和字符集

要检测编码(假设文件包含非ASCII字符),可以使用enca(请参见手册页)或file -i(linux)或file -I(osx)(请参见手册页)。

read_csv takes an encoding option to deal with files in different formats. I mostly use read_csv('file', encoding = "ISO-8859-1"), or alternatively encoding = "utf-8" for reading, and generally utf-8 for to_csv.

You can also use one of several alias options like 'latin' instead of 'ISO-8859-1' (see python docs, also for numerous other encodings you may encounter).

See relevant Pandas documentation, python docs examples on csv files, and plenty of related questions here on SO. A good background resource is What every developer should know about unicode and character sets.

To detect the encoding (assuming the file contains non-ascii characters), you can use enca (see man page) or file -i (linux) or file -I (osx) (see man page).


回答 1

所有解决方案中最简单的:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

替代解决方案:

  • Sublime文本编辑器中打开csv文件。
  • 以utf-8格式保存文件。

崇高地,单击文件->使用编码保存-> UTF-8

然后,您可以照常读取文件:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')

其他不同的编码类型是:

encoding = "cp1252"
encoding = "ISO-8859-1"

Simplest of all Solutions:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

Alternate Solution:

  • Open the csv file in Sublime text editor.
  • Save the file in utf-8 format.

In sublime, Click File -> Save with encoding -> UTF-8

Then, you can read your file as usual:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')

and the other different encoding types are:

encoding = "cp1252"
encoding = "ISO-8859-1"

回答 2

熊猫允许指定编码,但不允许忽略错误以免自动替换有问题的字节。因此,没有一种适合所有方法的大小,而是取决于实际用例的不同方法。

  1. 您知道编码,并且文件中没有编码错误。太好了:您只需要指定编码即可:

    file_encoding = 'cp1252'        # set file_encoding to the file encoding (utf8, latin1, etc.)
    pd.read_csv(input_file_and_path, ..., encoding=file_encoding)
  2. 您不希望被编码问题困扰,无论某些文本字段是否包含垃圾内容,都只希望加载该死的文件。好的,您只需要使用Latin1编码,因为它接受任何可能的字节作为输入(并将其转换为相同代码的unicode字符):

    pd.read_csv(input_file_and_path, ..., encoding='latin1')
  3. 您知道大多数文件都是用特定的编码编写的,但是它也包含编码错误。一个真实的示例是一个UTF8文件,该文件已使用非utf8编辑器进行了编辑,并且其中包含一些使用不同编码的行。Pandas没有提供特殊的错误处理的准备,但是Python open函数具有(假设Python3),并且read_csv接受像object这样的文件。在这里使用的典型错误参数是'ignore'仅抑制有问题的字节,或者(IMHO更好)'backslashreplace'用其Python的反斜杠转义序列替换有问题的字节:

    file_encoding = 'utf8'        # set file_encoding to the file encoding (utf8, latin1, etc.)
    input_fd = open(input_file_and_path, encoding=file_encoding, errors = 'backslashreplace')
    pd.read_csv(input_fd, ...)

Pandas allows to specify encoding, but does not allow to ignore errors not to automatically replace the offending bytes. So there is no one size fits all method but different ways depending on the actual use case.

  1. You know the encoding, and there is no encoding error in the file. Great: you have just to specify the encoding:

    file_encoding = 'cp1252'        # set file_encoding to the file encoding (utf8, latin1, etc.)
    pd.read_csv(input_file_and_path, ..., encoding=file_encoding)
    
  2. You do not want to be bothered with encoding questions, and only want that damn file to load, no matter if some text fields contain garbage. Ok, you only have to use Latin1 encoding because it accept any possible byte as input (and convert it to the unicode character of same code):

    pd.read_csv(input_file_and_path, ..., encoding='latin1')
    
  3. You know that most of the file is written with a specific encoding, but it also contains encoding errors. A real world example is an UTF8 file that has been edited with a non utf8 editor and which contains some lines with a different encoding. Pandas has no provision for a special error processing, but Python open function has (assuming Python3), and read_csv accepts a file like object. Typical errors parameter to use here are 'ignore' which just suppresses the offending bytes or (IMHO better) 'backslashreplace' which replaces the offending bytes by their Python’s backslashed escape sequence:

    file_encoding = 'utf8'        # set file_encoding to the file encoding (utf8, latin1, etc.)
    input_fd = open(input_file_and_path, encoding=file_encoding, errors = 'backslashreplace')
    pd.read_csv(input_fd, ...)
    

回答 3

with open('filename.csv') as f:
   print(f)

执行此代码后,您将找到“ filename.csv”的编码,然后执行以下代码

data=pd.read_csv('filename.csv', encoding="encoding as you found earlier"

你去

with open('filename.csv') as f:
   print(f)

after executing this code you will find encoding of ‘filename.csv’ then execute code as following

data=pd.read_csv('filename.csv', encoding="encoding as you found earlier"

there you go


回答 4

就我而言,USC-2 LE BOM根据Notepad ++ ,文件具有编码。它encoding="utf_16_le"用于python。

希望这有助于更快找到某人的答案。

In my case, a file has USC-2 LE BOM encoding, according to Notepad++. It is encoding="utf_16_le" for python.

Hope, it helps to find an answer a bit faster for someone.


回答 5

就我而言,这适用于python 2.7:

data = read_csv(filename, encoding = "ISO-8859-1", dtype={'name_of_colum': unicode}, low_memory=False) 

而对于python 3,仅:

data = read_csv(filename, encoding = "ISO-8859-1", low_memory=False) 

In my case this worked for python 2.7:

data = read_csv(filename, encoding = "ISO-8859-1", dtype={'name_of_colum': unicode}, low_memory=False) 

And for python 3, only:

data = read_csv(filename, encoding = "ISO-8859-1", low_memory=False) 

回答 6

尝试指定engine =’python’。它对我有用,但我仍在尝试找出原因。

df = pd.read_csv(input_file_path,...engine='python')

Try specifying the engine=’python’. It worked for me but I’m still trying to figure out why.

df = pd.read_csv(input_file_path,...engine='python')

回答 7

我正在发布答案,以提供有关为什么会出现此问题的更新解决方案和解释。假设您正在从数据库或Excel工作簿中获取此数据。如果您有特殊字符,例如La Cañada Flintridge city,除非您使用UTF-8编码导出数据,否则将引入错误。La Cañada Flintridge city将成为La Ca\xf1ada Flintridge city。如果您pandas.read_csv对默认参数没有任何调整,则会遇到以下错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 5: invalid continuation byte

幸运的是,有一些解决方案。

选项1,修复出口。确保使用UTF-8编码。

选项2,如果您无法解决出口问题,而需要使用pandas.read_csv,请确保包括以下参数engine='python'。缺省情况下,pandas使用engine='C'此选项非常适合读取大型干净文件,但如果出现意外情况,它将崩溃。根据我的经验,设置encoding='utf-8'从未解决过这个问题UnicodeDecodeError。另外,您不需要使用errors_bad_lines,但是,如果您确实需要它,那仍然是一个选择。

pd.read_csv(<your file>, engine='python')

选项3:解决方案是我个人首选的解决方案。使用香草Python读取文件。

import pandas as pd

data = []

with open(<your file>, "rb") as myfile:
    # read the header seperately
    # decode it as 'utf-8', remove any special characters, and split it on the comma (or deliminator)
    header = myfile.readline().decode('utf-8').replace('\r\n', '').split(',')
    # read the rest of the data
    for line in myfile:
        row = line.decode('utf-8', errors='ignore').replace('\r\n', '').split(',')
        data.append(row)

# save the data as a dataframe
df = pd.DataFrame(data=data, columns = header)

希望这可以帮助人们第一次遇到这个问题。

I am posting an answer to provide an updated solution and explanation as to why this problem can occur. Say you are getting this data from a database or Excel workbook. If you have special characters like La Cañada Flintridge city, well unless you are exporting the data using UTF-8 encoding, you’re going to introduce errors. La Cañada Flintridge city will become La Ca\xf1ada Flintridge city. If you are using pandas.read_csv without any adjustments to the default parameters, you’ll hit the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 5: invalid continuation byte

Fortunately, there are a few solutions.

Option 1, fix the exporting. Be sure to use UTF-8 encoding.

Option 2, if fixing the exporting problem is not available to you, and you need to use pandas.read_csv, be sure to include the following paramters, engine='python'. By default, pandas uses engine='C' which is great for reading large clean files, but will crash if anything unexpected comes up. In my experience, setting encoding='utf-8' has never fixed this UnicodeDecodeError. Also, you do not need to use errors_bad_lines, however, that is still an option if you REALLY need it.

pd.read_csv(<your file>, engine='python')

Option 3: solution is my preferred solution personally. Read the file using vanilla Python.

import pandas as pd

data = []

with open(<your file>, "rb") as myfile:
    # read the header seperately
    # decode it as 'utf-8', remove any special characters, and split it on the comma (or deliminator)
    header = myfile.readline().decode('utf-8').replace('\r\n', '').split(',')
    # read the rest of the data
    for line in myfile:
        row = line.decode('utf-8', errors='ignore').replace('\r\n', '').split(',')
        data.append(row)

# save the data as a dataframe
df = pd.DataFrame(data=data, columns = header)

Hope this helps people encountering this issue for the first time.


回答 8

挣扎了一段时间,以为我会在这个问题上发布,因为它是第一个搜索结果。将encoding="iso-8859-1"标签添加到熊猫read_csv没有用,也没有任何其他编码,但始终给出UnicodeDecodeError。

如果您要传递文件句柄,则pd.read_csv(),需要将encoding属性放在文件上,而不是中read_csv。事后看来很明显,但是要跟踪却有一个微妙的错误。

Struggled with this a while and thought I’d post on this question as it’s the first search result. Adding the encoding="iso-8859-1" tag to pandas read_csv didn’t work, nor did any other encoding, kept giving a UnicodeDecodeError.

If you’re passing a file handle to pd.read_csv(), you need to put the encoding attribute on the file open, not in read_csv. Obvious in hindsight, but a subtle error to track down.


回答 9

这个答案似乎可以解决CSV编码问题。如果标题出现奇怪的编码问题,如下所示:

>>> f = open(filename,"r")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('\ufeffid', '1'), ... ])

然后,您在CSV文件的开头就有一个字节顺序标记(BOM)字符。这个答案解决了这个问题:

Python读取csv-BOM嵌入第一个密钥

解决方案是使用加载CSV encoding="utf-8-sig"

>>> f = open(filename,"r", encoding="utf-8-sig")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('id', '1'), ... ])

希望这对某人有帮助。

This answer seems to be the catch-all for CSV encoding issues. If you are getting a strange encoding problem with your header like this:

>>> f = open(filename,"r")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('\ufeffid', '1'), ... ])

Then you have a byte order mark (BOM) character at the beginning of your CSV file. This answer addresses the issue:

Python read csv – BOM embedded into the first key

The solution is to load the CSV with encoding="utf-8-sig":

>>> f = open(filename,"r", encoding="utf-8-sig")
>>> reader = DictReader(f)
>>> next(reader)
OrderedDict([('id', '1'), ... ])

Hopefully this helps someone.


回答 10

我正在发布此旧线程的更新。我找到了一个可行的解决方案,但需要打开每个文件。我在LibreOffice中打开了csv文件,选择另存为>编辑过滤器设置。在下拉菜单中,我选择了UTF8编码。然后我添加encoding="utf-8-sig"data = pd.read_csv(r'C:\fullpathtofile\filename.csv', sep = ',', encoding="utf-8-sig")

希望这对某人有帮助。

I am posting an update to this old thread. I found one solution that worked, but requires opening each file. I opened my csv file in LibreOffice, chose Save As > edit filter settings. In the drop-down menu I chose UTF8 encoding. Then I added encoding="utf-8-sig" to the data = pd.read_csv(r'C:\fullpathtofile\filename.csv', sep = ',', encoding="utf-8-sig").

Hope this helps someone.


回答 11

我无法打开从网上银行下载的简体中文CSV文件,我尝试过latin1,尝试过iso-8859-1cp1252,但都无济于事。

但是pd.read_csv("",encoding ='gbk')工作就完成了。

I have trouble opening a CSV file in simplified Chinese downloaded from an online bank, I have tried latin1, I have tried iso-8859-1, I have tried cp1252, all to no avail.

But pd.read_csv("",encoding ='gbk') simply does the work.


回答 12

请尝试添加

encoding='unicode_escape'

这会有所帮助。为我工作。另外,请确保使用正确的定界符和列名。

您可以从仅加载1000行开始,以快速加载文件。

Please try to add

encoding='unicode_escape'

This will help. Worked for me. Also, make sure you’re using the correct delimiter and column names.

You can start with loading just 1000 rows to load the file quickly.


回答 13

我正在使用Jupyter笔记本。以我为例,它以错误的格式显示文件。“编码”选项无效。因此,我将CSV保存为utf-8格式,并且可以正常工作。

I am using Jupyter-notebook. And in my case, it was showing the file in the wrong format. The ‘encoding’ option was not working. So I save the csv in utf-8 format, and it works.


回答 14

尝试这个:

import pandas as pd
with open('filename.csv') as f:
    data = pd.read_csv(f)

看起来它会处理编码,而无需通过参数明确表示

Try this:

import pandas as pd
with open('filename.csv') as f:
    data = pd.read_csv(f)

Looks like it will take care of the encoding without explicitly expressing it through argument


回答 15

在传递给熊猫之前,请检查编码。它会使您减速,但是…

with open(path, 'r') as f:
    encoding = f.encoding 

df = pd.read_csv(path,sep=sep, encoding=encoding)

在python 3.7中

Check the encoding before you pass to pandas. It will slow you down, but…

with open(path, 'r') as f:
    encoding = f.encoding 

df = pd.read_csv(path,sep=sep, encoding=encoding)

In python 3.7


回答 16

我遇到的另一个导致相同错误的重要问题是:

_values = pd.read_csv("C:\Users\Mujeeb\Desktop\file.xlxs")

^此行导致相同的错误,因为我正在使用read_csv()方法读取Excel文件。使用read_excel()阅读.xlxs

Another important issue that I faced which resulted in the same error was:

_values = pd.read_csv("C:\Users\Mujeeb\Desktop\file.xlxs")

^This line resulted in the same error because I am reading an excel file using read_csv() method. Use read_excel() for reading .xlxs


将元组扩展为参数

问题:将元组扩展为参数

有没有一种方法可以将Python元组扩展为函数-作为实际参数?

例如,这里expand()做了魔术:

some_tuple = (1, "foo", "bar")

def myfun(number, str1, str2):
    return (number * 2, str1 + str2, str2 + str1)

myfun(expand(some_tuple)) # (2, "foobar", "barfoo")

我知道可以将其定义myfunmyfun((a, b, c)),但是当然可能会有遗留代码。谢谢

Is there a way to expand a Python tuple into a function – as actual parameters?

For example, here expand() does the magic:

some_tuple = (1, "foo", "bar")

def myfun(number, str1, str2):
    return (number * 2, str1 + str2, str2 + str1)

myfun(expand(some_tuple)) # (2, "foobar", "barfoo")

I know one could define myfun as myfun((a, b, c)), but of course there may be legacy code. Thanks


回答 0

myfun(*some_tuple)完全符合您的要求。的*操作者只需解包元组(或任何可迭代),并把它们作为位置函数的自变量。阅读有关解压缩参数的更多信息。

myfun(*some_tuple) does exactly what you request. The * operator simply unpacks the tuple (or any iterable) and passes them as the positional arguments to the function. Read more about unpacking arguments.


回答 1

请注意,您还可以扩展参数列表的一部分:

myfun(1, *("foo", "bar"))

Note that you can also expand part of argument list:

myfun(1, *("foo", "bar"))

回答 2

看一下Python教程的第4.7.3和4.7.4节。它讨论将元组作为参数传递。

我还将考虑使用命名参数(并传递字典),而不是使用元组并传递序列。当位置不直观或有多个参数时,我发现使用位置参数是一种不好的做法。

Take a look at the Python tutorial section 4.7.3 and 4.7.4. It talks about passing tuples as arguments.

I would also consider using named parameters (and passing a dictionary) instead of using a tuple and passing a sequence. I find the use of positional arguments to be a bad practice when the positions are not intuitive or there are multiple parameters.


回答 3

这是功能编程方法。它从语法糖中提升了元组扩展功能:

apply_tuple = lambda f, t: f(*t)

用法示例:

from toolz import * 
from operator import add, eq

apply_tuple = curry(apply_tuple)

thread_last(
    [(1,2), (3,4)],
    (map, apply_tuple(add)),
    list,
    (eq, [3, 7])
)
# Prints 'True'

咖喱的redefiniton apply_tuple节省了大量的partial,从长远来看通话。

This is the functional programming method. It lifts the tuple expansion feature out of syntax sugar:

apply_tuple = lambda f, t: f(*t)

Example usage:

from toolz import * 
from operator import add, eq

apply_tuple = curry(apply_tuple)

thread_last(
    [(1,2), (3,4)],
    (map, apply_tuple(add)),
    list,
    (eq, [3, 7])
)
# Prints 'True'

curry redefiniton of apply_tuple saves a lot of partial calls in the long run.


生成器表达式与列表理解

问题:生成器表达式与列表理解

什么时候应该使用生成器表达式,什么时候应该在Python中使用列表推导?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

回答 0

John的答案很好(当您要迭代多次时,列表理解会更好)。但是,还应注意,如果要使用任何列表方法,都应使用列表。例如,以下代码将不起作用:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

基本上,如果您要做的只是迭代一次,则使用生成器表达式。如果要存储和使用生成的结果,则最好使用列表理解功能。

由于性能是选择彼此的最常见原因,所以我的建议是不要担心它,而只选择一个即可。如果您发现程序运行速度太慢,则只有这样,您才应回去担心调整代码。

John’s answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it’s also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won’t work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you’re doing is iterating once. If you want to store and use the generated results, then you’re probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.


回答 1

遍历生成器表达式列表理解将执行相同的操作。但是,列表理解将首先在内存中创建整个列表,而生成器表达式将在运行中创建项目,因此您可以将其用于非常大的(也可以是无限的!)序列。

Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.


回答 2

当结果需要多次迭代或速度至关重要时,请使用列表推导。使用范围较大或无限的生成器表达式。

有关更多信息,请参见生成器表达式和列表推导。

Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.

See Generator expressions and list comprehensions for more info.


回答 3

重要的是列表理解会创建一个新列表。生成器创建一个可迭代的对象,当您使用这些位时,它将动态“过滤”源材料。

假设您有一个名为“ hugefile.txt”的2TB日志文件,并且想要以单词“ ENTRY”开头的所有行的内容和长度。

因此,您尝试通过编写列表理解来开始:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

这样会抓取整个文件,处理每一行,并将匹配的行存储在数组中。因此,此阵列最多可以包含2TB的内容。那会占用很多RAM,对于您的目的可能不切实际。

因此,我们可以使用生成器将“过滤器”应用于我们的内容。直到我们开始遍历结果之前,才实际读取任何数据。

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

甚至没有从我们的文件中读取任何一行。实际上,假设我们想进一步过滤结果:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

仍未读取任何内容,但是我们现在指定了两个生成器,它们将根据需要对数据起作用。

让我们将过滤后的行写到另一个文件中:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

现在我们读取输入文件。随着for循环继续请求其他行,long_entries生成器要求生成器提供行entry_lines,仅返回长度大于80个字符的行。然后,entry_lines生成器从logfile迭代迭代器读取文件。

因此,不是以完全填充列表的形式将数据“推送”到输出函数,而是为输出函数提供了一种仅在需要时才“拉”数据的方法。在我们的情况下,这要高效得多,但不够灵活。生成器是一种方式,一次通过。我们读取的日志文件中的数据会立即被丢弃,因此我们无法返回上一行。另一方面,完成数据后,我们不必担心保留数据。

The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will “filter” the source material on-the-fly as you consume the bits.

Imagine you have a 2TB log file called “hugefile.txt”, and you want the content and length for all the lines that start with the word “ENTRY”.

So you try starting out by writing a list comprehension:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That’s a lot of RAM, and probably not practical for your purposes.

So instead we can use a generator to apply a “filter” to our content. No data is actually read until we start iterating over the result.

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

Still nothing has been read, but we’ve specified now two generators that will act on our data as we wish.

Lets write out our filtered lines to another file:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

So instead of “pushing” data to your output function in the form of a fully-populated list, you’re giving the output function a way to “pull” data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we’ve read gets immediately discarded, so we can’t go back to a previous line. On the other hand, we don’t have to worry about keeping data around once we’re done with it.


回答 4

生成器表达式的好处是它使用较少的内存,因为它不会立即构建整个列表。当列表是中间变量时,最好使用生成器表达式,例如对结果求和或根据结果创建字典。

例如:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

这样做的好处是列表不会完全生成,因此使用的内存很少(而且应该更快)

但是,当所需的最终产品是列表时,应该使用列表推导。您将不会使用生成器表达式保存任何内存,因为您需要生成的列表。您还可以获得能够使用任何列表功能(如已排序或反转)的好处。

例如:

reversed( [x*2 for x in xrange(256)] )

The benefit of a generator expression is that it uses less memory since it doesn’t build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.

For example:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

The advantage there is that the list isn’t completely generated, and thus little memory is used (and should also be faster)

You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.

For example:

reversed( [x*2 for x in xrange(256)] )

回答 5

从可变对象(如列表)创建生成器时,请注意,生成器将在使用生成器时(而不是在创建生成器时)根据列表的状态进行评估:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

如果您的列表有可能被修改(或列表中的可变对象),但是您需要在生成器创建时的状态,则需要使用列表推导。

When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.


回答 6

有时,您可以从itertools中使用tee函数,它为同一生成器返回多个迭代器,这些迭代器可以独立使用。

Sometimes you can get away with the tee function from itertools, it returns multiple iterators for the same generator that can be used independently.


回答 7

我正在使用Hadoop Mincemeat模块。我认为这是一个值得注意的好例子:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

在这里,生成器从文本文件(最大为15GB)中获取数字,并使用Hadoop的map-reduce对这些数字进行简单的数学运算。如果我没有使用yield函数,而是使用列表理解,那么计算总和和平均值将花费更长的时间(更不用说空间复杂性了)。

Hadoop是利用Generators的所有优点的一个很好的例子。

I’m using the Hadoop Mincemeat module. I think this is a great example to take a note of:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop’s map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

Hadoop is a great example for using all the advantages of Generators.


Python日期时间到没有微秒组件的字符串

问题:Python日期时间到没有微秒组件的字符串

我正在将UTC时间字符串添加到当前仅包含Amsterdam(!)时间字符串的Bitbucket API响应中。为了与其他地方返回的UTC时间字符串保持一致,请使用所需的格式2011-11-03 11:07:04(后跟+00:00,但这不是紧密联系)。

什么是创建这样一个字符串(最好的方式,而不从一微秒组件)datetime的实例微秒组成部分?

>>> import datetime
>>> print unicode(datetime.datetime.now())
2011-11-03 11:13:39.278026

我会添加出现在我身上的最佳选择作为可能的答案,但是可能会有更优雅的解决方案。

编辑:我应该提一下,我实际上并不是打印当前时间-我曾经datetime.now提供一个简单的例子。因此,该解决方案不应假定datetime其接收到的任何实例都将包含微秒组件。

I’m adding UTC time strings to Bitbucket API responses that currently only contain Amsterdam (!) time strings. For consistency with the UTC time strings returned elsewhere, the desired format is 2011-11-03 11:07:04 (followed by +00:00, but that’s not germane).

What’s the best way to create such a string (without a microsecond component) from a datetime instance with a microsecond component?

>>> import datetime
>>> print unicode(datetime.datetime.now())
2011-11-03 11:13:39.278026

I’ll add the best option that’s occurred to me as a possible answer, but there may well be a more elegant solution.

Edit: I should mention that I’m not actually printing the current time – I used datetime.now to provide a quick example. So the solution should not assume that any datetime instances it receives will include microsecond components.


回答 0

如果要以datetime不同于标准格式的特定格式格式化对象,则最好明确指定该格式:

>>> datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
'2011-11-03 18:21:26'

有关指令的说明,请参见的文档datetime.strftime()%

If you want to format a datetime object in a specific format that is different from the standard format, it’s best to explicitly specify that format:

>>> datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
'2011-11-03 18:21:26'

See the documentation of datetime.strftime() for an explanation of the % directives.


回答 1

>>> import datetime
>>> now = datetime.datetime.now()
>>> print unicode(now.replace(microsecond=0))
2011-11-03 11:19:07
>>> import datetime
>>> now = datetime.datetime.now()
>>> print unicode(now.replace(microsecond=0))
2011-11-03 11:19:07

回答 2

在Python 3.6中:

from datetime import datetime
datetime.datetime.now().isoformat(' ', 'seconds')
'2017-01-11 14:41:33'

https://docs.python.org/3.6/library/datetime.html#datetime.datetime.isoformat

In Python 3.6:

from datetime import datetime
datetime.datetime.now().isoformat(' ', 'seconds')
'2017-01-11 14:41:33'

https://docs.python.org/3.6/library/datetime.html#datetime.datetime.isoformat


回答 3

这就是我做到的方式。ISO格式:

import datetime
datetime.datetime.now().replace(microsecond=0).isoformat()
# Returns: '2017-01-23T14:58:07'

如果您不想使用ISO格式,则可以替换为’T’:

datetime.datetime.now().replace(microsecond=0).isoformat(' ')
# Returns: '2017-01-23 15:05:27'

This is the way I do it. ISO format:

import datetime
datetime.datetime.now().replace(microsecond=0).isoformat()
# Returns: '2017-01-23T14:58:07'

You can replace the ‘T’ if you don’t want ISO format:

datetime.datetime.now().replace(microsecond=0).isoformat(' ')
# Returns: '2017-01-23 15:05:27'

回答 4

另一个选择:

>>> import time
>>> time.strftime("%Y-%m-%d %H:%M:%S")
'2011-11-03 11:31:28'

默认情况下,这使用本地时间,如果您需要UTC,则可以使用以下时间:

>>> time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
'2011-11-03 18:32:20'

Yet another option:

>>> import time
>>> time.strftime("%Y-%m-%d %H:%M:%S")
'2011-11-03 11:31:28'

By default this uses local time, if you need UTC you can use the following:

>>> time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
'2011-11-03 18:32:20'

回答 5

通过切片保留所需的前19个字符:

>>> str(datetime.datetime.now())[:19]
'2011-11-03 14:37:50'

Keep the first 19 characters that you wanted via slicing:

>>> str(datetime.datetime.now())[:19]
'2011-11-03 14:37:50'

回答 6

我通常这样做:

import datetime
now = datetime.datetime.now()
now = now.replace(microsecond=0)  # To print now without microsecond.

# To print now:
print(now)

输出:

2019-01-13 14:40:28

I usually do:

import datetime
now = datetime.datetime.now()
now = now.replace(microsecond=0)  # To print now without microsecond.

# To print now:
print(now)

output:

2019-01-13 14:40:28

回答 7

由于并非所有datetime.datetime实例都具有微秒成分(即当它为零时),因此可以将字符串划分为“”。并只取第一项,它将始终有效:

unicode(datetime.datetime.now()).partition('.')[0]

Since not all datetime.datetime instances have a microsecond component (i.e. when it is zero), you can partition the string on a “.” and take only the first item, which will always work:

unicode(datetime.datetime.now()).partition('.')[0]

回答 8

我们可以尝试如下

import datetime

date_generated = datetime.datetime.now()
date_generated.replace(microsecond=0).isoformat(' ').partition('+')[0]

We can try something like below

import datetime

date_generated = datetime.datetime.now()
date_generated.replace(microsecond=0).isoformat(' ').partition('+')[0]

回答 9

我发现这是最简单的方法。

>>> t = datetime.datetime.now()
>>> t
datetime.datetime(2018, 11, 30, 17, 21, 26, 606191)
>>> t = str(t).split('.')
>>> t
['2018-11-30 17:21:26', '606191']
>>> t = t[0]
>>> t
'2018-11-30 17:21:26'
>>> 

I found this to be the simplest way.

>>> t = datetime.datetime.now()
>>> t
datetime.datetime(2018, 11, 30, 17, 21, 26, 606191)
>>> t = str(t).split('.')
>>> t
['2018-11-30 17:21:26', '606191']
>>> t = t[0]
>>> t
'2018-11-30 17:21:26'
>>> 

回答 10

我之所以使用它,是因为我可以更好地理解并记住它(日期时间格式也可以根据您的选择进行自定义):-

import datetime
moment = datetime.datetime.now()
print("{}/{}/{} {}:{}:{}".format(moment.day, moment.month, moment.year,
                                 moment.hour, moment.minute, moment.second))

This I use because I can understand and hence remember it better (and date time format also can be customized based on your choice) :-

import datetime
moment = datetime.datetime.now()
print("{}/{}/{} {}:{}:{}".format(moment.day, moment.month, moment.year,
                                 moment.hour, moment.minute, moment.second))

使用“导入模块”还是“从模块导入”?

问题:使用“导入模块”还是“从模块导入”?

我试图找到一个综合指南,以决定是否最好使用import modulefrom module import?我刚开始使用Python,并且正在尝试着眼于最佳实践。

基本上,我希望有人能分享他们的经验,其他开发人员有什么喜好,以及避免遇到麻烦的最佳方法是什么?

I’ve tried to find a comprehensive guide on whether it is best to use import module or from module import? I’ve just started with Python and I’m trying to start off with best practices in mind.

Basically, I was hoping if anyone could share their experiences, what preferences other developers have and what’s the best way to avoid any gotchas down the road?


回答 0

import module和之间的区别from module import foo主要是主观的。选择最喜欢的一个,并在使用中保持一致。这里有一些要点可以帮助您做出决定。

import module

  • 优点:
    • 减少您的import报表维护。无需添加任何其他导入即可开始使用模块中的另一个项目
  • 缺点:
    • 输入module.foo代码可能既乏味又多余(可以通过使用import module as mo然后键入来最小化乏味mo.foo

from module import foo

  • 优点:
    • 减少打字使用 foo
    • 更好地控制可以访问模块的哪些项目
  • 缺点:
    • 要使用模块中的新项目,您必须更新import语句
    • 您会失去有关的信息foo。例如,ceil()math.ceil()

两种方法都可以接受,但不要使用from module import *

对于任何合理的大型代码集,如果您import *可能会将其固定在模块中,则无法删除。这是因为很难确定代码中使用的哪些项来自“模块”,这很容易达到您认为不再使用它们的地步,import但是很难确定。

The difference between import module and from module import foo is mainly subjective. Pick the one you like best and be consistent in your use of it. Here are some points to help you decide.

import module

  • Pros:
    • Less maintenance of your import statements. Don’t need to add any additional imports to start using another item from the module
  • Cons:
    • Typing module.foo in your code can be tedious and redundant (tedium can be minimized by using import module as mo then typing mo.foo)

from module import foo

  • Pros:
    • Less typing to use foo
    • More control over which items of a module can be accessed
  • Cons:
    • To use a new item from the module you have to update your import statement
    • You lose context about foo. For example, it’s less clear what ceil() does compared to math.ceil()

Either method is acceptable, but don’t use from module import *.

For any reasonable large set of code, if you import * you will likely be cementing it into the module, unable to be removed. This is because it is difficult to determine what items used in the code are coming from ‘module’, making it easy to get to the point where you think you don’t use the import any more but it’s extremely difficult to be sure.


回答 1

这里还有另一个细节,未提及,与写入模块有关。当然,这可能不是很常见,但是我不时需要它。

由于引用和名称绑定在Python中的工作方式,如果您想从该模块外部更新模块中的某些符号(例如foo.bar),并且要更改其他导入代码“ see”,则必须导入foo a某种方式。例如:

模块foo:

bar = "apples"

模块a:

import foo
foo.bar = "oranges"   # update bar inside foo module object

模块b:

import foo           
print foo.bar        # if executed after a's "foo.bar" assignment, will print "oranges"

但是,如果导入符号名称而不是模块名称,则将无法使用。

例如,如果我在模块a中这样做:

from foo import bar
bar = "oranges"

a之外的任何代码都不会将bar视为“橙色”,因为我对bar的设置仅影响模块a中的名称“ bar”,它没有“进入” foo模块对象并更新其“ bar”。

There’s another detail here, not mentioned, related to writing to a module. Granted this may not be very common, but I’ve needed it from time to time.

Due to the way references and name binding works in Python, if you want to update some symbol in a module, say foo.bar, from outside that module, and have other importing code “see” that change, you have to import foo a certain way. For example:

module foo:

bar = "apples"

module a:

import foo
foo.bar = "oranges"   # update bar inside foo module object

module b:

import foo           
print foo.bar        # if executed after a's "foo.bar" assignment, will print "oranges"

However, if you import symbol names instead of module names, this will not work.

For example, if I do this in module a:

from foo import bar
bar = "oranges"

No code outside of a will see bar as “oranges” because my setting of bar merely affected the name “bar” inside module a, it did not “reach into” the foo module object and update its “bar”.


回答 2

尽管已经有很多人对importvs进行了解释import from,但我还是想尝试多解释一些关于幕后发生的事情以及它发生的所有变化的位置。


import foo

导入foo,并在当前命名空间中创建对该模块的引用。然后,您需要定义完整的模块路径,以从模块内部访问特定的属性或方法。

例如foo.bar但不是bar

from foo import bar

导入foo,并创建对列出的所有成员(bar)的引用。不设置变量foo

例如bar但不是bazfoo.baz

from foo import *

导入foo并创建对该模块在当前命名空间中定义的所有公共对象的引用(__all__如果__all__存在,则列出的所有对象,否则所有不以开头的对象_)。不设置变量foo

例如barbaz但不是_quxfoo._qux


现在让我们看看我们何时进行操作import X.Y

>>> import sys
>>> import os.path

检查sys.modules名称osos.path

>>> sys.modules['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> sys.modules['os.path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>

使用和检查globals()locals()命名空间字典:osos.path

 >>> globals()['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> locals()['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> globals()['os.path']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'os.path'
>>>

从上面的示例中,我们发现仅os在本地和全局命名空间中插入了。因此,我们应该能够使用:

 >>> os
 <module 'os' from
  '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
 >>> os.path
 <module 'posixpath' from
 '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
 >>>

但是不是path

>>> path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'path' is not defined
>>>

os从locals()命名空间删除后,将无法访问它们osos.path即使它们存在于sys.modules中:

>>> del locals()['os']
>>> os
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>> os.path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>>

现在让我们谈谈import from

from

>>> import sys
>>> from os import path

sys.modulesos和检查os.path

>>> sys.modules['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> sys.modules['os.path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>

我们发现,sys.modules通过使用与以前相同import name

OK,让我们检查一下它的外观locals()globals()命名空间字典:

>>> globals()['path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> locals()['path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> globals()['os']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'os'
>>>

您可以使用名称访问,path而不能使用os.path

>>> path
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> os.path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>>

让我们从中删除“路径” locals()

>>> del locals()['path']
>>> path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'path' is not defined
>>>

最后一个使用别名的示例:

>>> from os import path as HELL_BOY
>>> locals()['HELL_BOY']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> globals()['HELL_BOY']
<module 'posixpath' from /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>>

而且没有定义路径:

>>> globals()['path']
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
KeyError: 'path'
>>>

Even though many people already explained about import vs import from, I want to try to explain a bit more about what happens under the hood, and where all the places it changes are.


import foo:

Imports foo, and creates a reference to that module in the current namespace. Then you need to define completed module path to access a particular attribute or method from inside the module.

E.g. foo.bar but not bar

from foo import bar:

Imports foo, and creates references to all the members listed (bar). Does not set the variable foo.

E.g. bar but not baz or foo.baz

from foo import *:

Imports foo, and creates references to all public objects defined by that module in the current namespace (everything listed in __all__ if __all__ exists, otherwise everything that doesn’t start with _). Does not set the variable foo.

E.g. bar and baz but not _qux or foo._qux.


Now let’s see when we do import X.Y:

>>> import sys
>>> import os.path

Check sys.modules with name os and os.path:

>>> sys.modules['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> sys.modules['os.path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>

Check globals() and locals() namespace dicts with os and os.path:

 >>> globals()['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> locals()['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> globals()['os.path']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'os.path'
>>>

From the above example we found that only os is inserted in the local and global namespace. So, we should be able to use:

 >>> os
 <module 'os' from
  '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
 >>> os.path
 <module 'posixpath' from
 '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
 >>>

But not path.

>>> path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'path' is not defined
>>>

Once you delete the os from locals() namespace, you won’t be able to access os as well as os.path even though they exist in sys.modules:

>>> del locals()['os']
>>> os
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>> os.path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>>

Now let’s talk about import from:

from:

>>> import sys
>>> from os import path

Check sys.modules with os and os.path:

>>> sys.modules['os']
<module 'os' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc'>
>>> sys.modules['os.path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>

We found that in sys.modules we found as same as we did before by using import name

OK, let’s check how it looks like in locals() and globals() namespace dicts:

>>> globals()['path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> locals()['path']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> globals()['os']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'os'
>>>

You can access by using name path not by os.path:

>>> path
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> os.path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'os' is not defined
>>>

Let’s delete ‘path’ from locals():

>>> del locals()['path']
>>> path
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'path' is not defined
>>>

One final example using an alias:

>>> from os import path as HELL_BOY
>>> locals()['HELL_BOY']
<module 'posixpath' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>> globals()['HELL_BOY']
<module 'posixpath' from /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.pyc'>
>>>

And no path defined:

>>> globals()['path']
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
KeyError: 'path'
>>>

回答 3

两种方法都受支持是有原因的:有时候,一种方法比另一种更合适。

  • import module:当您使用模块中的许多位时很好。缺点是您需要使用模块名称来限定每个引用。

  • from module import ...:很高兴导入的项目无需模块名称前缀即可直接使用。缺点是您必须列出您使用的每件事,并且在代码中不清楚来源。

使用哪种方法取决于使代码清晰易读的方式,并且与个人喜好有关。我倾向于import module一般,因为在代码中很清楚对象或函数的来源。我在代码中大量from module import ...使用某些对象/函数时使用。

Both ways are supported for a reason: there are times when one is more appropriate than the other.

  • import module: nice when you are using many bits from the module. drawback is that you’ll need to qualify each reference with the module name.

  • from module import ...: nice that imported items are usable directly without module name prefix. The drawback is that you must list each thing you use, and that it’s not clear in code where something came from.

Which to use depends on which makes the code clear and readable, and has more than a little to do with personal preference. I lean toward import module generally because in the code it’s very clear where an object or function came from. I use from module import ... when I’m using some object/function a lot in the code.


回答 4

我个人总是使用

from package.subpackage.subsubpackage import module

然后以

module.function
module.modulevar

原因是同时有简短的调用,并且您清楚地定义了每个例程的模块命名空间,如果您必须在源代码中搜索给定模块的用法,这很有用。

不用说,不要使用import *,因为它会污染您的命名空间,并且不会告诉您给定函数的来源(来自哪个模块)

当然,如果两个不同软件包中的两个不同模块具有相同的模块名称,则可能会遇到麻烦。

from package1.subpackage import module
from package2.subpackage import module

在这种情况下,您当然会遇到麻烦,但是有一个强烈的暗示,即您的程序包布局存在缺陷,您必须重新考虑它。

I personally always use

from package.subpackage.subsubpackage import module

and then access everything as

module.function
module.modulevar

etc. The reason is that at the same time you have short invocation, and you clearly define the module namespace of each routine, something that is very useful if you have to search for usage of a given module in your source.

Needless to say, do not use the import *, because it pollutes your namespace and it does not tell you where a given function comes from (from which module)

Of course, you can run in trouble if you have the same module name for two different modules in two different packages, like

from package1.subpackage import module
from package2.subpackage import module

in this case, of course you run into troubles, but then there’s a strong hint that your package layout is flawed, and you have to rethink it.


回答 5

import module

当您将使用模块中的许多功能时,最好。

from module import function

当您只想避免使用模块中的所有函数和类型污染全局命名空间时,这是最好的选择function

import module

Is best when you will use many functions from the module.

from module import function

Is best when you want to avoid polluting the global namespace with all the functions and types from a module when you only need function.


回答 6

我刚刚发现了这两种方法之间的另一个细微差别。

如果模块foo使用以下导入:

from itertools import count

然后,模块bar可能会错误地使用count,就好像它是在中定义的foo,而不是在中定义的itertools

import foo
foo.count()

如果foo使用:

import itertools

该错误仍然可能,但不太可能发生。bar需要:

import foo
foo.itertools.count()

这给我带来了麻烦。我有一个模块,该模块错误地从未定义异常的模块导入了异常,而仅从其他模块(使用from module import SomeException)导入了异常。当不再需要导入并将其删除时,损坏的模块将被破坏。

I’ve just discovered one more subtle difference between these two methods.

If module foo uses a following import:

from itertools import count

Then module bar can by mistake use count as though it was defined in foo, not in itertools:

import foo
foo.count()

If foo uses:

import itertools

the mistake is still possible, but less likely to be made. bar needs to:

import foo
foo.itertools.count()

This caused some troubles to me. I had a module that by mistake imported an exception from a module that did not define it, only imported it from other module (using from module import SomeException). When the import was no longer needed and removed, the offending module was broken.


回答 7

这是另一个未提及的区别。这是从http://docs.python.org/2/tutorial/modules.html逐字复制的

请注意,使用时

from package import item

该项目可以是包的子模块(或子包),也可以是包中定义的其他名称,例如函数,类或变量。import语句首先测试项目是否在包装中定义;如果不是,则假定它是一个模块并尝试加载它。如果找不到它,则会引发ImportError异常。

相反,当使用类似

import item.subitem.subsubitem

除最后一个项目外,每个项目都必须是一个包装;最后一项可以是模块或包,但不能是上一项中定义的类或函数或变量。

Here is another difference not mentioned. This is copied verbatim from http://docs.python.org/2/tutorial/modules.html

Note that when using

from package import item

the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable. The import statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an ImportError exception is raised.

Contrarily, when using syntax like

import item.subitem.subsubitem

each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item.


回答 8

由于我还是初学者,因此我将尝试以一种简单的方式来解释这一点:在Python中,我们有三种类型的import语句,它们是:

1.通用进口:

import math

这种类型的导入是我个人的最爱,这种导入技术的唯一缺点是,如果需要使用任何模块的功能,则必须使用以下语法:

math.sqrt(4)

当然,它会增加打字的工作量,但是作为一个初学者,它将帮助您跟踪与之相关的模块和功能(一个好的文本编辑器将大大减少打字的工作量,建议使用)。

使用以下import语句可以进一步减少打字工作:

import math as m

现在,math.sqrt()可以使用代替使用m.sqrt()

2.函数导入:

from math import sqrt

如果您的代码只需要访问模块中的单个或几个函数,而要使用模块中的任何新项,则必须更新import语句,则这种类型的导入最适合。

3.普遍进口:

from math import * 

尽管它可以显着减少键入工作,但是不建议这样做,因为它将用模块中的各种函数填充代码,并且它们的名称可能与用户定义函数的名称冲突。 例:

如果您有自己的名为sqrt的函数并且导入了数学运算,则该函数是安全的:存在您的sqrt和Math.sqrt。但是,如果从数学导入*进行操作,则会遇到问题:即,两个具有相同名称的不同函数。资料来源:Codecademy

Since I am also a beginner, I will be trying to explain this in a simple way: In Python, we have three types of import statements which are:

1. Generic imports:

import math

this type of import is my personal favorite, the only downside to this import technique is that if you need use any module’s function you must use the following syntax:

math.sqrt(4)

of course, it increases the typing effort but as a beginner, it will help you to keep track of module and function associated with it, (a good text editor will reduce the typing effort significantly and is recommended).

Typing effort can be further reduced by using this import statement:

import math as m

now, instead of using math.sqrt() you can use m.sqrt().

2. Function imports:

from math import sqrt

this type of import is best suited if your code only needs to access single or few functions from the module, but for using any new item from the module you have to update import statement.

3. Universal imports:

from math import * 

Although it reduces typing effort significantly but is not recommended because it will fill your code with various functions from the module and their name could conflict with the name of user-defined functions. example:

If you have a function of your very own named sqrt and you import math, your function is safe: there is your sqrt and there is math.sqrt. If you do from math import *, however, you have a problem: namely, two different functions with the exact same name. Source: Codecademy


回答 9

import package
import module

使用import,令牌必须是模块(包含Python命令的文件)或包(sys.path包含文件的文件夹__init__.py)。

有子包时:

import package1.package2.package
import package1.package2.module

为文件夹(封装)或文件(模块)的要求是相同的,但该文件夹或文件必须是内部package2必须是内部package1,并且两个package1package2必须包含__init__.py文件。https://docs.python.org/2/tutorial/modules.html

具有from导入样式:

from package1.package2 import package
from package1.package2 import module

程序包或模块将输入包含import语句的文件的命名空间用module(或package)代替package1.package2.module。您始终可以绑定到更方便的名称:

a = big_package_name.subpackage.even_longer_subpackage_name.function

只有from导入样式允许您命名特定的函数或变量:

from package3.module import some_function

被允许,但是

import package3.module.some_function 

不被允许。

import package
import module

With import, the token must be a module (a file containing Python commands) or a package (a folder in the sys.path containing a file __init__.py.)

When there are subpackages:

import package1.package2.package
import package1.package2.module

the requirements for folder (package) or file (module) are the same, but the folder or file must be inside package2 which must be inside package1, and both package1 and package2 must contain __init__.py files. https://docs.python.org/2/tutorial/modules.html

With the from style of import:

from package1.package2 import package
from package1.package2 import module

the package or module enters the namespace of the file containing the import statement as module (or package) instead of package1.package2.module. You can always bind to a more convenient name:

a = big_package_name.subpackage.even_longer_subpackage_name.function

Only the from style of import permits you to name a particular function or variable:

from package3.module import some_function

is allowed, but

import package3.module.some_function 

is not allowed.


回答 10

补充说一下from x import *:除了使人更难分辨名称的来源之外,还抛出了像Pylint这样的代码检查器。他们会将这些名称报告为未定义的变量。

To add to what people have said about from x import *: besides making it more difficult to tell where names came from, this throws off code checkers like Pylint. They will report those names as undefined variables.


回答 11

我自己的答案主要取决于首先要使用的模块数量。如果我只使用一两个,那么我将经常使用from…,import因为它使文件其余部分的击键次数减少,但是如果我要使用许多不同的模块,我更喜欢import因为这意味着每个模块引用都是自记录的。我可以看到每个符号的来源,而不必四处寻找。

通常,我更喜欢纯文本导入的自我记录样式,仅当要输入的模块名称次数超过10到20时才更改为from .. import,即使只有一个模块被导入。

My own answer to this depends mostly on first, how many different modules I’ll be using. If i’m only going to use one or two, I’ll often use fromimport since it makes for fewer keystrokes in the rest of the file, but if I’m going to make use of many different modules, I prefer just import because that means that each module reference is self-documenting. I can see where each symbol comes from without having to hunt around.

Usuaully I prefer the self documenting style of plain import and only change to from.. import when the number of times I have to type the module name grows above 10 to 20, even if there’s only one module being imported.


回答 12

其中一个显著差异,我发现它的出奇,没有人一直在谈论的是使用纯进口,您可以访问private variableprivate functions从导入模块,这是不可能的从导入语句。

在此处输入图片说明

图片中的代码:

setting.py

public_variable = 42
_private_variable = 141
def public_function():
    print("I'm a public function! yay!")
def _private_function():
    print("Ain't nobody accessing me from another module...usually")

plain_importer.py

import settings
print (settings._private_variable)
print (settings.public_variable)
settings.public_function()
settings._private_function()

# Prints:
# 141
# 42
# I'm a public function! yay!
# Ain't nobody accessing me from another module...usually

from_importer.py

from settings import *
#print (_private_variable) #doesn't work
print (public_variable)
public_function()
#_private_function()   #doesn't work

One of the significant difference I found out which surprisingly no-one has talked about is that using plain import you can access private variable and private functions from the imported module, which isn’t possible with from-import statement.

enter image description here

Code in image:

setting.py

public_variable = 42
_private_variable = 141
def public_function():
    print("I'm a public function! yay!")
def _private_function():
    print("Ain't nobody accessing me from another module...usually")

plain_importer.py

import settings
print (settings._private_variable)
print (settings.public_variable)
settings.public_function()
settings._private_function()

# Prints:
# 141
# 42
# I'm a public function! yay!
# Ain't nobody accessing me from another module...usually

from_importer.py

from settings import *
#print (_private_variable) #doesn't work
print (public_variable)
public_function()
#_private_function()   #doesn't work

回答 13

导入模块-您无需付出额外的努力即可从模块中获取其他东西。它具有诸如冗余键入之类的缺点

从模块导入-减少键入操作,更多地控制可以访问模块的项目。要使用模块中的新项目,必须更新导入语句。

Import Module – You don’t need additional efforts to fetch another thing from module. It has disadvantages such as redundant typing

Module Import From – Less typing &More control over which items of a module can be accessed.To use a new item from the module you have to update your import statement.


回答 14

有一些内置模块主要包含裸函数(base64mathosshutilsystime等),将这些裸函数绑定到某些命名空间绝对是一个好习惯,从而提高了您的可读性码。考虑没有这些命名空间的情况下,理解这些功能的含义会更加困难:

copysign(foo, bar)
monotonic()
copystat(foo, bar)

而不是将它们绑定到某个模块时:

math.copysign(foo, bar)
time.monotonic()
shutil.copystat(foo, bar)

有时甚至需要命名空间来避免不同模块之间的冲突(json.loadpickle.load


另一方面,有些模块主要包含类(configparserdatetimetempfilezipfile,…),其中许多模块使类名变得不言而喻:

configparser.RawConfigParser()
datetime.DateTime()
email.message.EmailMessage()
tempfile.NamedTemporaryFile()
zipfile.ZipFile()

因此,在将这些类与代码中的其他模块命名空间一起使用时是否存在争议,是否会增加一些新信息还是仅仅是延长代码长度,就存在争议。

There are some builtin modules that contain mostly bare functions (base64, math, os, shutil, sys, time, …) and it is definitely a good practice to have these bare functions bound to some namespace and thus improve the readability of your code. Consider how more difficult is to understand the meaning of these functions without their namespace:

copysign(foo, bar)
monotonic()
copystat(foo, bar)

than when they are bound to some module:

math.copysign(foo, bar)
time.monotonic()
shutil.copystat(foo, bar)

Sometimes you even need the namespace to avoid conflicts between different modules (json.load vs. pickle.load)


On the other hand there are some modules that contain mostly classes (configparser, datetime, tempfile, zipfile, …) and many of them make their class names self-explanatory enough:
configparser.RawConfigParser()
datetime.DateTime()
email.message.EmailMessage()
tempfile.NamedTemporaryFile()
zipfile.ZipFile()

so there can be a debate whether using these classes with the additional module namespace in your code adds some new information or just lengthens the code.


回答 15

我想补充一点,在导入调用期间需要考虑一些事项:

我有以下结构:

mod/
    __init__.py
    main.py
    a.py
    b.py
    c.py
    d.py

main.py:

import mod.a
import mod.b as b
from mod import c
import d

dis.dis显示了不同之处:

  1           0 LOAD_CONST               0 (-1)
              3 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (mod.a)
              9 STORE_NAME               1 (mod)

  2          12 LOAD_CONST               0 (-1)
             15 LOAD_CONST               1 (None)
             18 IMPORT_NAME              2 (b)
             21 STORE_NAME               2 (b)

  3          24 LOAD_CONST               0 (-1)
             27 LOAD_CONST               2 (('c',))
             30 IMPORT_NAME              1 (mod)
             33 IMPORT_FROM              3 (c)
             36 STORE_NAME               3 (c)
             39 POP_TOP

  4          40 LOAD_CONST               0 (-1)
             43 LOAD_CONST               1 (None)
             46 IMPORT_NAME              4 (mod.d)
             49 LOAD_ATTR                5 (d)
             52 STORE_NAME               5 (d)
             55 LOAD_CONST               1 (None)

最后,它们看起来相同(每个示例中的结果都是STORE_NAME),但是值得注意的是,如果您需要考虑以下四个循环导入:

例子1

foo/
   __init__.py
   a.py
   b.py
a.py:
import foo.b 
b.py:
import foo.a
>>> import foo.a
>>>

这有效

例子2

bar/
   __init__.py
   a.py
   b.py
a.py:
import bar.b as b
b.py:
import bar.a as a
>>> import bar.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "bar\a.py", line 1, in <module>
    import bar.b as b
  File "bar\b.py", line 1, in <module>
    import bar.a as a
AttributeError: 'module' object has no attribute 'a'

没有骰子

例子3

baz/
   __init__.py
   a.py
   b.py
a.py:
from baz import b
b.py:
from baz import a
>>> import baz.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "baz\a.py", line 1, in <module>
    from baz import b
  File "baz\b.py", line 1, in <module>
    from baz import a
ImportError: cannot import name a

类似的问题…但是从x import y明显不同于与import import xy和y相同

例子4

qux/
   __init__.py
   a.py
   b.py
a.py:
import b 
b.py:
import a
>>> import qux.a
>>>

这个也可以

I would like to add to this, there are somethings to consider during the import calls:

I have the following structure:

mod/
    __init__.py
    main.py
    a.py
    b.py
    c.py
    d.py

main.py:

import mod.a
import mod.b as b
from mod import c
import d

dis.dis shows the difference:

  1           0 LOAD_CONST               0 (-1)
              3 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (mod.a)
              9 STORE_NAME               1 (mod)

  2          12 LOAD_CONST               0 (-1)
             15 LOAD_CONST               1 (None)
             18 IMPORT_NAME              2 (b)
             21 STORE_NAME               2 (b)

  3          24 LOAD_CONST               0 (-1)
             27 LOAD_CONST               2 (('c',))
             30 IMPORT_NAME              1 (mod)
             33 IMPORT_FROM              3 (c)
             36 STORE_NAME               3 (c)
             39 POP_TOP

  4          40 LOAD_CONST               0 (-1)
             43 LOAD_CONST               1 (None)
             46 IMPORT_NAME              4 (mod.d)
             49 LOAD_ATTR                5 (d)
             52 STORE_NAME               5 (d)
             55 LOAD_CONST               1 (None)

In the end they look the same (STORE_NAME is result in each example), but this is worth noting if you need to consider the following four circular imports:

example1

foo/
   __init__.py
   a.py
   b.py
a.py:
import foo.b 
b.py:
import foo.a
>>> import foo.a
>>>

This works

example2

bar/
   __init__.py
   a.py
   b.py
a.py:
import bar.b as b
b.py:
import bar.a as a
>>> import bar.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "bar\a.py", line 1, in <module>
    import bar.b as b
  File "bar\b.py", line 1, in <module>
    import bar.a as a
AttributeError: 'module' object has no attribute 'a'

No dice

example3

baz/
   __init__.py
   a.py
   b.py
a.py:
from baz import b
b.py:
from baz import a
>>> import baz.a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "baz\a.py", line 1, in <module>
    from baz import b
  File "baz\b.py", line 1, in <module>
    from baz import a
ImportError: cannot import name a

Similar issue… but clearly from x import y is not the same as import import x.y as y

example4

qux/
   __init__.py
   a.py
   b.py
a.py:
import b 
b.py:
import a
>>> import qux.a
>>>

This one also works


回答 16

这是我当前目录的目录结构:

.  
└─a  
   └─b  
     └─c
  1. import语句会记住所有中间名称
    这些名称必须是合格的:

    In[1]: import a.b.c
    
    In[2]: a
    Out[2]: <module 'a' (namespace)>
    
    In[3]: a.b
    Out[3]: <module 'a.b' (namespace)>
    
    In[4]: a.b.c
    Out[4]: <module 'a.b.c' (namespace)>
  2. from ... import ...语句记住导入的名称
    此名称不能为限定名称:

    In[1]: from a.b import c
    
    In[2]: a
    NameError: name 'a' is not defined
    
    In[2]: a.b
    NameError: name 'a' is not defined
    
    In[3]: a.b.c
    NameError: name 'a' is not defined
    
    In[4]: c
    Out[4]: <module 'a.b.c' (namespace)>

  • 注意:当然,我在步骤1和2之间重新启动了Python控制台。

This is my directory structure of my current directory:

.  
└─a  
   └─b  
     └─c
  1. The import statement remembers all intermediate names.
    These names have to be qualified:

    In[1]: import a.b.c
    
    In[2]: a
    Out[2]: <module 'a' (namespace)>
    
    In[3]: a.b
    Out[3]: <module 'a.b' (namespace)>
    
    In[4]: a.b.c
    Out[4]: <module 'a.b.c' (namespace)>
    
  2. The from ... import ... statement remembers only the imported name.
    This name must not be qualified:

    In[1]: from a.b import c
    
    In[2]: a
    NameError: name 'a' is not defined
    
    In[2]: a.b
    NameError: name 'a' is not defined
    
    In[3]: a.b.c
    NameError: name 'a' is not defined
    
    In[4]: c
    Out[4]: <module 'a.b.c' (namespace)>
    

  • Note: Of course, I restarted my Python console between steps 1 and 2.

回答 17

正如Jan Wrobel所提到的,不同进口的一个方面是进口的公开方式。

模块神话

from math import gcd
...

使用神话

import mymath
mymath.gcd(30, 42)  # will work though maybe not expected

如果我gcd仅为内部使用而导入,而不是向的用户公开mymath,可能会带来不便。我经常遇到这种情况,在大多数情况下,我想“保持模块清洁”。

除了扬·沃伯Jan Wrobel)提议通过使用import math来进一步掩盖这一点之外,我还开始使用领先的下划线来掩盖进口的隐瞒:

# for instance...
from math import gcd as _gcd
# or...
import math as _math

在较大的项目中,这种“最佳实践”使我能够精确控制后续进口中公开的内容和未公开内容。这样可以使我的模块保持清洁,并以一定规模的项目回报。

As Jan Wrobel mentions, one aspect of the different imports is in which way the imports are disclosed.

Module mymath

from math import gcd
...

Use of mymath:

import mymath
mymath.gcd(30, 42)  # will work though maybe not expected

If I imported gcd only for internal use, not to disclose it to users of mymath, this can be inconvenient. I have this pretty often, and in most cases I want to “keep my modules clean”.

Apart from the proposal of Jan Wrobel to obscure this a bit more by using import math instead, I have started to hide imports from disclosure by using a leading underscore:

# for instance...
from math import gcd as _gcd
# or...
import math as _math

In larger projects this “best practice” allows my to exactly control what is disclosed to subsequent imports and what isn’t. This keeps my modules clean and pays back at a certain size of project.