标签归档:Python

如何将YAML格式的数据写入文件?

问题:如何将YAML格式的数据写入文件?

我需要使用Python将以下数据写入yaml文件:

{A:a, B:{C:c, D:d, E:e}} 

即字典中的字典。我该如何实现?

I need to write the below data to yaml file using Python:

{A:a, B:{C:c, D:d, E:e}} 

i.e., dictionary in a dictionary. How can I achieve this?


回答 0

import yaml

data = dict(
    A = 'a',
    B = dict(
        C = 'c',
        D = 'd',
        E = 'e',
    )
)

with open('data.yml', 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

default_flow_style=False参数对于产生所需的格式(流样式)是必需的,否则对于嵌套集合,它将产生块样式:

A: a
B: {C: c, D: d, E: e}
import yaml

data = dict(
    A = 'a',
    B = dict(
        C = 'c',
        D = 'd',
        E = 'e',
    )
)

with open('data.yml', 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

The default_flow_style=False parameter is necessary to produce the format you want (flow style), otherwise for nested collections it produces block style:

A: a
B: {C: c, D: d, E: e}

回答 1

链接到PyYAML文档,其中显示了default_flow_style参数的差异。要将其以块模式写入文件(通常更具可读性):

d = {'A':'a', 'B':{'C':'c', 'D':'d', 'E':'e'}}
with open('result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file, default_flow_style=False)

生成:

A: a
B:
  C: c
  D: d
  E: e

Link to the PyYAML documentation showing the difference for the default_flow_style parameter. To write it to a file in block mode (often more readable):

d = {'A':'a', 'B':{'C':'c', 'D':'d', 'E':'e'}}
with open('result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file, default_flow_style=False)

produces:

A: a
B:
  C: c
  D: d
  E: e

检查变量是否为数据框

问题:检查变量是否为数据框

当我的函数f用一个变量调用时,我想检查var是否是一个熊猫数据框:

def f(var):
    if var == pd.DataFrame():
        print "do stuff"

我想解决方案可能很简单,但即使

def f(var):
    if var.values != None:
        print "do stuff"

我无法使其按预期方式工作。

when my function f is called with a variable I want to check if var is a pandas dataframe:

def f(var):
    if var == pd.DataFrame():
        print "do stuff"

I guess the solution might be quite simple but even with

def f(var):
    if var.values != None:
        print "do stuff"

I can’t get it to work like expected.


回答 0

使用isinstance,没有别的:

if isinstance(x, pd.DataFrame):
    ... # do something

PEP8明确表示这isinstance是检查类型的首选方法

No:  type(x) is pd.DataFrame
No:  type(x) == pd.DataFrame
Yes: isinstance(x, pd.DataFrame)

而且甚至不用考虑

if obj.__class__.__name__ = 'DataFrame':
    expect_problems_some_day()

isinstance处理继承(请参见type()和isinstance()之间的区别?)。例如,它会告诉你,如果一个变量是一个字符串(strunicode),因为他们从派生basestring

if isinstance(obj, basestring):
    i_am_string(obj)

专门针对pandas DataFrame对象:

import pandas as pd
isinstance(var, pd.DataFrame)

Use isinstance, nothing else:

if isinstance(x, pd.DataFrame):
    ... # do something

PEP8 says explicitly that isinstance is the preferred way to check types

No:  type(x) is pd.DataFrame
No:  type(x) == pd.DataFrame
Yes: isinstance(x, pd.DataFrame)

And don’t even think about

if obj.__class__.__name__ = 'DataFrame':
    expect_problems_some_day()

isinstance handles inheritance (see What are the differences between type() and isinstance()?). For example, it will tell you if a variable is a string (either str or unicode), because they derive from basestring)

if isinstance(obj, basestring):
    i_am_string(obj)

Specifically for pandas DataFrame objects:

import pandas as pd
isinstance(var, pd.DataFrame)

回答 1

使用内置isinstance()功能。

import pandas as pd

def f(var):
    if isinstance(var, pd.DataFrame):
        print("do stuff")

Use the built-in isinstance() function.

import pandas as pd

def f(var):
    if isinstance(var, pd.DataFrame):
        print("do stuff")

如何将布尔数组转换为int数组

问题:如何将布尔数组转换为int数组

我使用Scilab,并希望将布尔数组转换为整数数组:

>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)

在Scilab中,我可以使用:

>>> bool2s(y)
0.    0.    1.    1.  

甚至只是将其乘以1:

>>> 1*y
0.    0.    1.    1.  

在Python中是否有一个简单的命令,还是我必须使用循环?

I use Scilab, and want to convert an array of booleans into an array of integers:

>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)

In Scilab I can use:

>>> bool2s(y)
0.    0.    1.    1.  

or even just multiply it by 1:

>>> 1*y
0.    0.    1.    1.  

Is there a simple command for this in Python, or would I have to use a loop?


回答 0

numpy数组有一个astype方法。做吧y.astype(int)

请注意,根据您使用数组的目的,甚至可能没有必要执行此操作。在许多情况下,Bool会自动提升为int,因此您可以将其添加到int数组中,而无需显式转换它:

>>> x
array([ True, False,  True], dtype=bool)
>>> x + [1, 2, 3]
array([2, 2, 4])

Numpy arrays have an astype method. Just do y.astype(int).

Note that it might not even be necessary to do this, depending on what you’re using the array for. Bool will be autopromoted to int in many cases, so you can add it to int arrays without having to explicitly convert it:

>>> x
array([ True, False,  True], dtype=bool)
>>> x + [1, 2, 3]
array([2, 2, 4])

回答 1

1*y方法也适用于Numpy:

>>> import numpy as np
>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)
>>> 1*y                      # Method 1
array([0, 0, 1, 1])
>>> y.astype(int)            # Method 2
array([0, 0, 1, 1]) 

如果您正在寻求一种将Python列表从Boolean转换为int的方法,则可以使用以下map方法:

>>> testList = [False, False,  True,  True]
>>> map(lambda x: 1 if x else 0, testList)
[0, 0, 1, 1]
>>> map(int, testList)
[0, 0, 1, 1]

或使用列表推导:

>>> testList
[False, False, True, True]
>>> [int(elem) for elem in testList]
[0, 0, 1, 1]

The 1*y method works in Numpy too:

>>> import numpy as np
>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)
>>> 1*y                      # Method 1
array([0, 0, 1, 1])
>>> y.astype(int)            # Method 2
array([0, 0, 1, 1]) 

If you are asking for a way to convert Python lists from Boolean to int, you can use map to do it:

>>> testList = [False, False,  True,  True]
>>> map(lambda x: 1 if x else 0, testList)
[0, 0, 1, 1]
>>> map(int, testList)
[0, 0, 1, 1]

Or using list comprehensions:

>>> testList
[False, False, True, True]
>>> [int(elem) for elem in testList]
[0, 0, 1, 1]

回答 2

使用numpy,您可以执行以下操作:

y = x.astype(int)

如果您使用的是非numpy数组,则可以使用列表推导

y = [int(val) for val in x]

Using numpy, you can do:

y = x.astype(int)

If you were using a non-numpy array, you could use a list comprehension:

y = [int(val) for val in x]

回答 3

大多数时候,您不需要转换:

>>>array([True,True,False,False]) + array([1,2,3,4])
array([2, 3, 3, 4])

正确的方法是:

yourArray.astype(int)

要么

yourArray.astype(float)

Most of the time you don’t need conversion:

>>>array([True,True,False,False]) + array([1,2,3,4])
array([2, 3, 3, 4])

The right way to do it is:

yourArray.astype(int)

or

yourArray.astype(float)

回答 4

我知道您要求使用非循环解决方案,但无论如何,我能想到的唯一解决方案可能是内部循环的:

map(int,y)

要么:

[i*1 for i in y]

要么:

import numpy
y=numpy.array(y)
y*1

I know you asked for non-looping solutions, but the only solutions I can come up with probably loop internally anyway:

map(int,y)

or:

[i*1 for i in y]

or:

import numpy
y=numpy.array(y)
y*1

回答 5

一个有趣的方法是

>>> np.array([True, False, False]) + 0 
np.array([1, 0, 0])

A funny way to do this is

>>> np.array([True, False, False]) + 0 
np.array([1, 0, 0])

可以在Python中重置迭代器吗?

问题:可以在Python中重置迭代器吗?

我可以在Python中重置迭代器/生成器吗?我正在使用DictReader,并希望将其重置为文件的开头。

Can I reset an iterator / generator in Python? I am using DictReader and would like to reset it to the beginning of the file.


回答 0

我看到许多建议itertools.tee的答案,但这忽略了文档中的一项重要警告:

此itertool可能需要大量辅助存储(取决于需要存储多少临时数据)。一般来说,如果一个迭代器使用大部分或全部的数据的另一个前开始迭代器,它是更快地使用list()代替tee()

基本上,tee是针对以下情况设计的:一个迭代器的两个(或多个)克隆虽然彼此“不同步”,但这样做却不太多 -而是说它们是相同的“邻近性”(彼此后面或前面的几个项目)。不适合OP的“从头开始重做”问题。

L = list(DictReader(...))另一方面,只要字典列表可以舒适地存储在内存中,就非常适合。可以随时使用制作新的“从头开始的迭代器”(非常轻巧且开销低)iter(L),并且可以部分或全部使用它,而不会影响新的或现有的迭代器;其他访问模式也很容易获得。

正如正确回答的几个答案所述,在特定情况下,csv您还可以.seek(0)使用基础文件对象(一种特殊情况)。我不确定它是否已记录在案并得到保证,尽管目前可以使用。仅对于真正巨大的csv文件可能值得考虑,list我建议在其中使用通用方法,因为一般方法会占用太大的内存。

I see many answers suggesting itertools.tee, but that’s ignoring one crucial warning in the docs for it:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

Basically, tee is designed for those situation where two (or more) clones of one iterator, while “getting out of sync” with each other, don’t do so by much — rather, they say in the same “vicinity” (a few items behind or ahead of each other). Not suitable for the OP’s problem of “redo from the start”.

L = list(DictReader(...)) on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new “iterator from the start” (very lightweight and low-overhead) can be made at any time with iter(L), and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.

As several answers rightly remarked, in the specific case of csv you can also .seek(0) the underlying file object (a rather special case). I’m not sure that’s documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list I recommmend as the general approach would have too large a memory footprint.


回答 1

如果您有一个名为“ blah.csv”的csv文件,

a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6

您知道可以打开文件进行读取,并使用以下命令创建DictReader

blah = open('blah.csv', 'r')
reader= csv.DictReader(blah)

然后,您将能够获得带有的下一行reader.next(),该行应输出

{'a':1,'b':2,'c':3,'d':4}

再次使用它会产生

{'a':2,'b':3,'c':4,'d':5}

但是,在这一点上,如果您使用blah.seek(0),则下次调用reader.next()您会得到

{'a':1,'b':2,'c':3,'d':4}

再次。

这似乎是您要寻找的功能。我确定有一些与这种方法相关的技巧,但是我并不知道。@Brian建议简单地创建另一个DictReader。如果您是第一个阅读器,则在读取文件的过程中途无法进行此操作,因为无论您在文件中的任何位置,新阅读器都将具有意外的键和值。

If you have a csv file named ‘blah.csv’ That looks like

a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6

you know that you can open the file for reading, and create a DictReader with

blah = open('blah.csv', 'r')
reader= csv.DictReader(blah)

Then, you will be able to get the next line with reader.next(), which should output

{'a':1,'b':2,'c':3,'d':4}

using it again will produce

{'a':2,'b':3,'c':4,'d':5}

However, at this point if you use blah.seek(0), the next time you call reader.next() you will get

{'a':1,'b':2,'c':3,'d':4}

again.

This seems to be the functionality you’re looking for. I’m sure there are some tricks associated with this approach that I’m not aware of however. @Brian suggested simply creating another DictReader. This won’t work if you’re first reader is half way through reading the file, as your new reader will have unexpected keys and values from wherever you are in the file.


回答 2

不会。Python的迭代器协议非常简单,仅提供一种方法(.next()__next__()),并且通常不提供重置迭代器的方法。

常见的模式是使用相同的过程再次创建一个新的迭代器。

如果要“保存”迭代器以便回到其开始,也可以使用 itertools.tee

No. Python’s iterator protocol is very simple, and only provides one single method (.next() or __next__()), and no method to reset an iterator in general.

The common pattern is to instead create a new iterator using the same procedure again.

If you want to “save off” an iterator so that you can go back to its beginning, you may also fork the iterator by using itertools.tee


回答 3

是的,如果您numpy.nditer用来构建迭代器。

>>> lst = [1,2,3,4,5]
>>> itr = numpy.nditer([lst])
>>> itr.next()
1
>>> itr.next()
2
>>> itr.finished
False
>>> itr.reset()
>>> itr.next()
1

Yes, if you use numpy.nditer to build your iterator.

>>> lst = [1,2,3,4,5]
>>> itr = numpy.nditer([lst])
>>> itr.next()
1
>>> itr.next()
2
>>> itr.finished
False
>>> itr.reset()
>>> itr.next()
1

回答 4

.seek(0)上面的Alex Martelli和Wilduck提倡使用时有一个错误,即下一次调用.next()将为您提供标题行的字典,格式为{key1:key1, key2:key2, ...}。解决方法是file.seek(0)调用reader.next()摆脱标题行。

因此,您的代码将如下所示:

f_in = open('myfile.csv','r')
reader = csv.DictReader(f_in)

for record in reader:
    if some_condition:
        # reset reader to first row of data on 2nd line of file
        f_in.seek(0)
        reader.next()
        continue
    do_something(record)

There’s a bug in using .seek(0) as advocated by Alex Martelli and Wilduck above, namely that the next call to .next() will give you a dictionary of your header row in the form of {key1:key1, key2:key2, ...}. The work around is to follow file.seek(0) with a call to reader.next() to get rid of the header row.

So your code would look something like this:

f_in = open('myfile.csv','r')
reader = csv.DictReader(f_in)

for record in reader:
    if some_condition:
        # reset reader to first row of data on 2nd line of file
        f_in.seek(0)
        reader.next()
        continue
    do_something(record)

回答 5

这也许与原始问题正交,但是可以将迭代器包装在一个返回迭代器的函数中。

def get_iter():
    return iterator

要重置迭代器,只需再次调用该函数即可。如果当所述函数不带参数时该函数当然是微不足道的。

如果函数需要一些参数,请使用functools.partial创建一个可以传递的闭包,而不是原始的迭代器。

def get_iter(arg1, arg2):
   return iterator
from functools import partial
iter_clos = partial(get_iter, a1, a2)

这似乎避免了缓存tee(n个副本)或list(1个副本)需要做的缓存

This is perhaps orthogonal to the original question, but one could wrap the iterator in a function that returns the iterator.

def get_iter():
    return iterator

To reset the iterator just call the function again. This is of course trivial if the function when the said function takes no arguments.

In the case that the function requires some arguments, use functools.partial to create a closure that can be passed instead of the original iterator.

def get_iter(arg1, arg2):
   return iterator
from functools import partial
iter_clos = partial(get_iter, a1, a2)

This seems to avoid the caching that tee (n copies) or list (1 copy) would need to do


回答 6

对于小文件,您可以考虑使用more_itertools.seekable-提供重置可迭代对象的第三方工具。

演示版

import csv

import more_itertools as mit


filename = "data/iris.csv"
with open(filename, "r") as f:
    reader = csv.DictReader(f)
    iterable = mit.seekable(reader)                    # 1
    print(next(iterable))                              # 2
    print(next(iterable))
    print(next(iterable))

    print("\nReset iterable\n--------------")
    iterable.seek(0)                                   # 3
    print(next(iterable))
    print(next(iterable))
    print(next(iterable))

输出量

{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Reset iterable
--------------
{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

这里,a DictReader被包装在seekable对象(1)和高级(2)中。该seek()方法用于将迭代器重置/倒回第0个位置(3)。

注意:内存消耗会随着迭代的增加而增加,因此请谨慎使用此工具,如docs所示

For small files, you may consider using more_itertools.seekable – a third-party tool that offers resetting iterables.

Demo

import csv

import more_itertools as mit


filename = "data/iris.csv"
with open(filename, "r") as f:
    reader = csv.DictReader(f)
    iterable = mit.seekable(reader)                    # 1
    print(next(iterable))                              # 2
    print(next(iterable))
    print(next(iterable))

    print("\nReset iterable\n--------------")
    iterable.seek(0)                                   # 3
    print(next(iterable))
    print(next(iterable))
    print(next(iterable))

Output

{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Reset iterable
--------------
{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Here a DictReader is wrapped in a seekable object (1) and advanced (2). The seek() method is used to reset/rewind the iterator to the 0th position (3).

Note: memory consumption grows with iteration, so be wary applying this tool to large files, as indicated in the docs.


回答 7

尽管没有重置迭代器,但python 2.6(及更高版本)中的“ itertools”模块具有一些可在其中提供帮助的实用程序。其中之一是“ tee”,它可以制作一个迭代器的多个副本,并缓存前面运行的一个副本的结果,以便在副本上使用这些结果。我将满足您的目的:

>>> def printiter(n):
...   for i in xrange(n):
...     print "iterating value %d" % i
...     yield i

>>> from itertools import tee
>>> a, b = tee(printiter(5), 2)
>>> list(a)
iterating value 0
iterating value 1
iterating value 2
iterating value 3
iterating value 4
[0, 1, 2, 3, 4]
>>> list(b)
[0, 1, 2, 3, 4]

While there is no iterator reset, the “itertools” module from python 2.6 (and later) has some utilities that can help there. One of then is the “tee” which can make multiple copies of an iterator, and cache the results of the one running ahead, so that these results are used on the copies. I will seve your purposes:

>>> def printiter(n):
...   for i in xrange(n):
...     print "iterating value %d" % i
...     yield i

>>> from itertools import tee
>>> a, b = tee(printiter(5), 2)
>>> list(a)
iterating value 0
iterating value 1
iterating value 2
iterating value 3
iterating value 4
[0, 1, 2, 3, 4]
>>> list(b)
[0, 1, 2, 3, 4]

回答 8

对于DictReader:

f = open(filename, "rb")
d = csv.DictReader(f, delimiter=",")

f.seek(0)
d.__init__(f, delimiter=",")

对于DictWriter:

f = open(filename, "rb+")
d = csv.DictWriter(f, fieldnames=fields, delimiter=",")

f.seek(0)
f.truncate(0)
d.__init__(f, fieldnames=fields, delimiter=",")
d.writeheader()
f.flush()

For DictReader:

f = open(filename, "rb")
d = csv.DictReader(f, delimiter=",")

f.seek(0)
d.__init__(f, delimiter=",")

For DictWriter:

f = open(filename, "rb+")
d = csv.DictWriter(f, fieldnames=fields, delimiter=",")

f.seek(0)
f.truncate(0)
d.__init__(f, fieldnames=fields, delimiter=",")
d.writeheader()
f.flush()

回答 9

list(generator()) 返回生成器的所有剩余值,如果未循环,则有效地重置它。

list(generator()) returns all remaining values for a generator and effectively resets it if it is not looped.


回答 10

问题

我以前也遇到过同样的问题。在分析我的代码之后,我意识到尝试在循环内部重置迭代器会稍微增加时间复杂度,并且还会使代码有些难看。

打开文件并将行保存到内存中的变量中。

# initialize list of rows
rows = []

# open the file and temporarily name it as 'my_file'
with open('myfile.csv', 'rb') as my_file:

    # set up the reader using the opened file
    myfilereader = csv.DictReader(my_file)

    # loop through each row of the reader
    for row in myfilereader:
        # add the row to the list of rows
        rows.append(row)

现在,您无需处理迭代器就可以在范围内的任何地方循环浏览

Problem

I’ve had the same issue before. After analyzing my code, I realized that attempting to reset the iterator inside of loops slightly increases the time complexity and it also makes the code a bit ugly.

Solution

Open the file and save the rows to a variable in memory.

# initialize list of rows
rows = []

# open the file and temporarily name it as 'my_file'
with open('myfile.csv', 'rb') as my_file:

    # set up the reader using the opened file
    myfilereader = csv.DictReader(my_file)

    # loop through each row of the reader
    for row in myfilereader:
        # add the row to the list of rows
        rows.append(row)

Now you can loop through rows anywhere in your scope without dealing with an iterator.


回答 11

一种可能的选择是使用itertools.cycle(),这将允许您无限期地进行迭代,而无需使用任何技巧.seek(0)

iterDic = itertools.cycle(csv.DictReader(open('file.csv')))

One possible option is to use itertools.cycle(), which will allow you to iterate indefinitely without any trick like .seek(0).

iterDic = itertools.cycle(csv.DictReader(open('file.csv')))

回答 12

我遇到了同样的问题-虽然我喜欢 tee()解决方案,但我不知道我的文件将有多大,并且内存警告有关先消耗一个文件然后再另一个文件使我推迟采用该方法。

相反,我正在使用创建一对迭代器 iter()语句,并将第一个用于我的初始遍历,然后切换到第二个进行最终运行。

因此,对于字典读取器,如果使用以下方式定义读取器:

d = csv.DictReader(f, delimiter=",")

我可以根据此“规范”创建一对迭代器-使用:

d1, d2 = iter(d), iter(d)

然后d1,就可以安全地运行我的第一遍代码,因为第二个迭代器d2是从相同的根规范中定义的。

我没有对此进行详尽的测试,但是它似乎可以处理伪数据。

I’m arriving at this same issue – while I like the tee() solution, I don’t know how big my files are going to be and the memory warnings about consuming one first before the other are putting me off adopting that method.

Instead, I’m creating a pair of iterators using iter() statements, and using the first for my initial run-through, before switching to the second one for the final run.

So, in the case of a dict-reader, if the reader is defined using:

d = csv.DictReader(f, delimiter=",")

I can create a pair of iterators from this “specification” – using:

d1, d2 = iter(d), iter(d)

I can then run my 1st-pass code against d1, safe in the knowledge that the second iterator d2 has been defined from the same root specification.

I’ve not tested this exhaustively, but it appears to work with dummy data.


回答 13

仅当基础类型提供了这样做的机制时(例如fp.seek(0))。

Only if the underlying type provides a mechanism for doing so (e.g. fp.seek(0)).


回答 14

在“ iter()”调用的最后一次迭代中返回一个新创建的迭代器

class ResetIter: 
  def __init__(self, num):
    self.num = num
    self.i = -1

  def __iter__(self):
    if self.i == self.num-1: # here, return the new object
      return self.__class__(self.num) 
    return self

  def __next__(self):
    if self.i == self.num-1:
      raise StopIteration

    if self.i <= self.num-1:
      self.i += 1
      return self.i


reset_iter = ResetRange(10)
for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')

输出:

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 

Return a newly created iterator at the last iteration during the ‘iter()’ call

class ResetIter: 
  def __init__(self, num):
    self.num = num
    self.i = -1

  def __iter__(self):
    if self.i == self.num-1: # here, return the new object
      return self.__class__(self.num) 
    return self

  def __next__(self):
    if self.i == self.num-1:
      raise StopIteration

    if self.i <= self.num-1:
      self.i += 1
      return self.i


reset_iter = ResetRange(10)
for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')

Output:

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 

将熊猫数据框列表连接在一起

问题:将熊猫数据框列表连接在一起

我有一个熊猫数据框列表,我想将其合并为一个熊猫数据框。我正在使用Python 2.7.10和Pandas 0.16.2

我从以下位置创建了数据框列表:

import pandas as pd
dfs = []
sqlall = "select * from mytable"

for chunk in pd.read_sql_query(sqlall , cnxn, chunksize=10000):
    dfs.append(chunk)

这将返回数据帧列表

type(dfs[0])
Out[6]: pandas.core.frame.DataFrame

type(dfs)
Out[7]: list

len(dfs)
Out[8]: 408

这是一些示例数据

# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'two' : [19., 10., 11., 12.]})

# list of dataframes
mydfs = [d1, d2, d3]

我想将d1d2和组合d3成一个熊猫数据框。另外,使用该chunksize选项时将大表直接读入数据框的方法将非常有帮助。

I have a list of Pandas dataframes that I would like to combine into one Pandas dataframe. I am using Python 2.7.10 and Pandas 0.16.2

I created the list of dataframes from:

import pandas as pd
dfs = []
sqlall = "select * from mytable"

for chunk in pd.read_sql_query(sqlall , cnxn, chunksize=10000):
    dfs.append(chunk)

This returns a list of dataframes

type(dfs[0])
Out[6]: pandas.core.frame.DataFrame

type(dfs)
Out[7]: list

len(dfs)
Out[8]: 408

Here is some sample data

# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'two' : [19., 10., 11., 12.]})

# list of dataframes
mydfs = [d1, d2, d3]

I would like to combine d1, d2, and d3 into one pandas dataframe. Alternatively, a method of reading a large-ish table directly into a dataframe when using the chunksize option would be very helpful.


回答 0

鉴于所有数据框都具有相同的列,您可以简单地将concat它们:

import pandas as pd
df = pd.concat(list_of_dataframes)

Given that all the dataframes have the same columns, you can simply concat them:

import pandas as pd
df = pd.concat(list_of_dataframes)

回答 1

如果数据帧的所有列都不相同,请尝试以下操作:

df = pd.DataFrame.from_dict(map(dict,df_list))

If the dataframes DO NOT all have the same columns try the following:

df = pd.DataFrame.from_dict(map(dict,df_list))

回答 2

您也可以使用函数式编程来做到这一点:

from functools import reduce
reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)

You also can do it with functional programming:

from functools import reduce
reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)

回答 3

concat 对于使用“ loc”命令针对现有数据框提取的列表理解也可以很好地工作

df = pd.read_csv('./data.csv') # ie; Dataframe pulled from csv file with a "userID" column

review_ids = ['1','2','3'] # ie; ID values to grab from DataFrame

# Gets rows in df where IDs match in the userID column and combines them 

dfa = pd.concat([df.loc[df['userID'] == x] for x in review_ids])

concat also works nicely with a list comprehension pulled using the “loc” command against an existing dataframe

df = pd.read_csv('./data.csv') # ie; Dataframe pulled from csv file with a "userID" column

review_ids = ['1','2','3'] # ie; ID values to grab from DataFrame

# Gets rows in df where IDs match in the userID column and combines them 

dfa = pd.concat([df.loc[df['userID'] == x] for x in review_ids])

我应该在.gitignore文件中添加Django迁移文件吗?

问题:我应该在.gitignore文件中添加Django迁移文件吗?

我应该在文件中添加Django迁移.gitignore文件吗?

由于迁移冲突,我最近遇到了很多git问题,并且想知道是否应该将迁移文件标记为“忽略”。

如果是这样,我将如何添加我在应用程序中拥有的所有迁移并将它们添加到.gitignore文件中?

Should I be adding the Django migration files in the .gitignore file?

I’ve recently been getting a lot of git issues due to migration conflicts and was wondering if I should be marking migration files as ignore.

If so, how would I go about adding all of the migrations that I have in my apps, and adding them to the .gitignore file?


回答 0

引用Django迁移文档

每个应用程序的迁移文件都位于该应用程序内的“迁移”目录中,并被设计为提交至其代码库并作为其代码库的一部分进行分发。您应该在开发计算机上制作一次,然后在同事的计算机,登台计算机以及最终的生产计算机上运行相同的迁移。

如果遵循此过程,则迁移文件中不会出现任何合并冲突。

合并版本控制分支时,您仍然可能会遇到基于同一父级迁移的多个迁移的情况,例如,如果不同的开发人员同时引入了迁移。解决这种情况的一种方法是引入_merge_migration_。通常这可以通过以下命令自动完成

./manage.py makemigrations --merge

这将引入一个新的迁移,该迁移取决于当前的所有head迁移。当然,这仅在磁头迁移之间没有冲突时才有效,在这种情况下,您将必须手动解决问题。


鉴于这里有人建议您不要将迁移提交到版本控制,因此我想详细说明为什么您应该这样做。

首先,您需要记录应用于生产系统的迁移。如果将更改部署到生产中并想迁移数据库,则需要描述当前状态。您可以为应用到每个生产数据库的迁移创建单独的备份,但这似乎不必要。

其次,迁移通常包含自定义的手写代码。并非总是可以使用自动生成它们./manage.py makemigrations

第三,迁移应包含在代码审查中。它们是对您的生产系统的重大更改,很多事情都可能出错。

简而言之,如果您关心生产数据,请检查您向版本控制的迁移。

Quoting from the Django migrations documentation:

The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.

If you follow this process, you shouldn’t be getting any merge conflicts in the migration files.

When merging version control branches, you still may encounter a situation where you have multiple migrations based on the same parent migration, e.g. if to different developers introduced a migration concurrently. One way of resolving this situation is to introduce a _merge_migration_. Often this can be done automatically with the command

./manage.py makemigrations --merge

which will introduce a new migration that depends on all current head migrations. Of course this only works when there is no conflict between the head migrations, in which case you will have to resolve the problem manually.


Given that some people here suggested that you shouldn’t commit your migrations to version control, I’d like to expand on the reasons why you actually should do so.

First, you need a record of the migrations applied to your production systems. If you deploy changes to production and want to migrate the database, you need a description of the current state. You can create a separate backup of the migrations applied to each production database, but this seems unnecessarily cumbersome.

Second, migrations often contain custom, handwritten code. It’s not always possible to automatically generate them with ./manage.py makemigrations.

Third, migrations should be included in code review. They are significant changes to your production system, and there are lots of things that can go wrong with them.

So in short, if you care about your production data, please check your migrations into version control.


回答 1

您可以按照以下过程进行操作。

您可以在makemigrations本地运行,这将创建迁移文件。提交此新的迁移文件以回购。

我认为您根本不应该makemigrations投入生产。您可以migrate在生产环境中运行,并且会看到从您从本地提交的迁移文件中应用了迁移。这样您可以避免所有冲突。

在本地环境中,要创建迁移文件,

python manage.py makemigrations 
python manage.py migrate

现在提交这些新创建的文件,如下所示。

git add app/migrations/...
git commit -m 'add migration files' app/migrations/...

在生产环境中,仅运行以下命令。

python manage.py migrate

You can follow the below process.

You can run makemigrations locally and this creates the migration file. Commit this new migration file to repo.

In my opinion you should not run makemigrations in production at all. You can run migrate in production and you will see the migrations are applied from the migration file that you committed from local. This way you can avoid all conflicts.

IN LOCAL ENV, to create the migration files,

python manage.py makemigrations 
python manage.py migrate

Now commit these newly created files, something like below.

git add app/migrations/...
git commit -m 'add migration files' app/migrations/...

IN PRODUCTION ENV, run only the below command.

python manage.py migrate

回答 2

引用2018年文档Django 2.0。(两个单独的命令= makemigrationsmigrate

之所以有单独的命令来进行和应用迁移,是因为您会将迁移提交到版本控制系统,并随应用程序一起交付;它们不仅使您的开发更加容易,而且还可以被其他开发人员和生产环境使用。

https://docs.djangoproject.com/en/2.0/intro/tutorial02/

Quote from the 2018 docs, Django 2.0. (two separate commands = makemigrations and migrate)

The reason that there are separate commands to make and apply migrations is because you’ll commit migrations to your version control system and ship them with your app; they not only make your development easier, they’re also useable by other developers and in production.

https://docs.djangoproject.com/en/2.0/intro/tutorial02/


回答 3

TL; DR:提交迁移,解决迁移冲突,调整git工作流程。

感觉就像您需要调整git工作流程,而不是忽略冲突。

理想情况下,每个新功能都在不同的分支中开发,并与拉取请求合并回去。

如果存在冲突,则无法合并PR,因此需要合并其功能的人员需要解决冲突,包括迁移。这可能需要不同团队之间的协调。

提交迁移文件很重要!如果发生冲突,Django甚至可以帮助您解决那些冲突 ;)

TL;DR: commit migrations, resolve migration conflicts, adjust your git workflow.

Feels like you’d need to adjust your git workflow, instead of ignoring conflicts.

Ideally, every new feature is developed in a different branch, and merged back with a pull request.

PRs cannot be merged if there’s a conflict, therefore who needs to merge his feature needs to resolve the conflict, migrations included. This might need coordination between different teams.

It is important though to commit migration files! If a conflict arises, Django might even help you solve those conflicts ;)


回答 4

我无法想象为什么会出现冲突,除非您以某种方式编辑迁移?通常情况下,这很糟糕-如果有人错过了一些中间提交,那么他们就不会从正确的版本升级,并且他们的数据库副本也会被破坏。

我遵循的过程非常简单-每当您更改应用程序的模型时,您也会提交迁移,然后迁移不会改变 -如果您需要模型中的其他内容,则可以更改模型并提交新的迁移以及您的更改。

在未开发项目中,您通常可以删除迁移,并在发布时从0001_开始重新迁移,但是如果您具有生产代码,则不能(尽管可以将迁移压缩为一个)。

I can’t imagine why you would be getting conflicts, unless you’re editing the migrations somehow? That usually ends badly – if someone misses some intermediate commits then they won’t be upgrading from the correct version, and their copy of the database will be corrupted.

The process that I follow is pretty simple – whenever you change the models for an app, you also commit a migration, and then that migration doesn’t change – if you need something different in the model, then you change the model and commit a new migration alongside your changes.

In greenfield projects, you can often delete the migrations and start over from scratch with a 0001_ migration when you release, but if you have production code, then you can’t (though you can squash migrations down into one).


回答 5

通常使用的解决方案是,在将任何东西合并到母版之前,开发人员必须拉动任何远程更改。如果迁移版本存在冲突,则他应将其本地迁移(远程迁移已由其他开发人员运行,并且有可能在生产环境中)重命名为N + 1。

在开发过程中,不提交迁移就可以了(尽管不要添加忽略,只是不要添加忽略add)。但是一旦投入生产,就需要使用它们,以使模式与模型更改保持同步。

然后,您需要编辑文件,并将其更改dependencies为最新的远程版本。

这适用于Django迁移以及其他类似应用程序(sqlalchemy + alembic,RoR等)。

The solution usually used, is that, before anything is merged into master, the developer must pull any remote changes. If there’s a conflict in migration versions, he should rename his local migration (the remote one has been run by other devs, and, potentially, in production), to N+1.

During development it might be okay to just not-commit migrations (don’t add an ignore though, just don’t add them). But once you’ve gone into production, you’ll need them in order to keep the schema in sync with model changes.

You then need to edit the file, and change the dependencies to the latest remote version.

This works for Django migrations, as well as other similar apps (sqlalchemy+alembic, RoR, etc).


回答 6

在git中有一堆迁移文件很麻烦。迁移文件夹中只有一个文件,您不应忽略。该文件是init .py文件,如果忽略它,python将不再在目录内寻找子模块,因此任何导入模块的尝试都会失败。所以问题应该怎么忽略所有迁移文件,但初始化的.py?解决方案是:将’0 * .py’添加到.gitignore文件中,即可完美完成工作。

希望这对某人有帮助。

Having a bunch of migration files in git is messy. There is only one file in migration folder that you should not ignore. That file is init.py file, If you ignore it, python will no longer look for submodules inside the directory, so any attempts to import the modules will fail. So the question should be how to ignore all migration files but init.py? The solution is: Add ‘0*.py’ to .gitignore files and it does the job perfectly.

Hope this helps someone.


回答 7

如果您具有用于开发,登台和生产环境的单独的数据库,则忽略迁移。对于开发人员。目的您可以使用本地sqlite DB并在本地进行迁移。我建议您另外创建四个分支:

  1. 主-清除新代码而不进行迁移。没有人连接到该分支。仅用于代码审查

  2. 开发-日常开发。接受推/拉。每个开发人员都在使用sqlite DB

  3. Cloud_DEV_env-远程云/服务器DEV环境。只拉。将迁移保留在本地计算机上,用于代码部署和Dev数据库的远程迁移

  4. Cloud_STAG_env-远程云/服务器STAG环境。只拉。将迁移保留在本地计算机上,该迁移用于Stag数据库的代码部署和远程迁移

  5. Cloud_PROD_env-远程云/服务器DEV环境。只拉。将迁移保留在本地计算机上,该迁移用于Prod数据库的代码部署和远程迁移

注意:2、3、4-迁移可以保存在存储库中,但是应该有合并合并拉取请求的严格规则,因此我们决定找一个负责部署的人员,所以唯一拥有所有迁移文件的人-我们的部署-嗯 每当我们对模型进行任何更改时,他都会保留远程数据库迁移。

Gitignore the migrations, if You have separate DBs for Development, Staging and Production environment. For dev. purposes You can use local sqlite DB and play with migrations locally. I would recommend You to create four additional branches:

  1. Master – Clean fresh code without migrations. Nobody is connected to this branch. Used for code reviews only

  2. Development – daily development. Push/pull accepted. Each developer is working on sqlite DB

  3. Cloud_DEV_env – remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Dev database

  4. Cloud_STAG_env – remote cloud/server STAG environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Stag database

  5. Cloud_PROD_env – remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Prod database

Notes: 2, 3, 4 – migrations can be kept in repos but there should be strict rules of pull requests merging, so we decided to find a person, responsible for deployments, so the only guy who has all the migration files – our deploy-er. He keeps the remote DB migrations each time we have any changes in Models.


回答 8

简短答案 我建议排除回购中的迁移。代码合并后,只需运行即可./manage.py makemigrations

长答案 我不认为您应该将迁移文件放入存储库中。它将破坏其他人的开发环境以及其他产品和阶段环境中的迁移状态。(有关示例,请参见Sugar Tang的评论)。

以我的观点,Django迁移的目的是找到先前模型状态和新模型状态之间的差距,然后序列化该差距。如果您的模型在代码合并后发生更改,则可以简单makemigrations地找出差距。当您可以自动实现相同且无错误的迁移时,为什么要手动仔细合并其他迁移?Django文档说,

它们*(迁移)*被设计为自动的

; 请保持这种方式。要手动合并迁移,您必须完全了解其他更改和更改的任何依存关系。这会产生很多开销,而且容易出错。因此,跟踪模型文件就足够了。

这是工作流程中的一个好话题。我愿意接受其他选择。

Short answer I propose excluding migrations in the repo. After code merge, just run ./manage.py makemigrations and you are all set.

Long answer I don’t think you should put migrations files into repo. It will spoil the migration states in other person’s dev environment and other prod and stage environment. (refer to Sugar Tang’s comment for examples).

In my point of view, the purpose of Django migrations is to find gaps between previous model states and new model states, and then serialise the gap. If your model changes after code merge, you can simple do makemigrations to find out the gap. Why do you want to manually and carefully merge other migrations when you can achieve the same automatically and bug free? Django documentation says,

They*(migrations)*’re designed to be mostly automatic

; please keep it that way. To merge migrations manually, you have to fully understand what others have changed and any dependence of the changes. That’s a lot of overhead and error prone. So tracking models file is sufficient.

It is a good topic on the workflow. I am open to other options.


在Python中,如何将YAML映射加载为OrderedDicts?

问题:在Python中,如何将YAML映射加载为OrderedDicts?

我想让PyYAML的加载器将映射(和有序映射)加载到Python 2.7+ OrderedDict类型中,而不是dict它当前使用的普通和​​对列表。

最好的方法是什么?

I’d like to get PyYAML‘s loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.

What’s the best way to do that?


回答 0

更新:在python 3.6+中OrderedDict,由于新的dict实现已经在pypy中使用了一段时间(尽管现在考虑了CPython实现的细节),您可能根本不需要。

更新:在python 3.7+中,dict对象的插入顺序保留性质已声明为Python语言规范的正式组成部分,请参阅Python 3.7的新增功能

我喜欢@James的简单解决方案。但是,它更改了默认的全局yaml.Loader类,这可能导致麻烦的副作用。特别是在编写库代码时,这是一个坏主意。另外,它不能直接与使用yaml.safe_load()

幸运的是,无需付出太多努力即可改进该解决方案:

import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

# usage example:
ordered_load(stream, yaml.SafeLoader)

对于序列化,我不知道明显的概括,但是至少这应该没有任何副作用:

def ordered_dump(data, stream=None, Dumper=yaml.Dumper, **kwds):
    class OrderedDumper(Dumper):
        pass
    def _dict_representer(dumper, data):
        return dumper.represent_mapping(
            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
            data.items())
    OrderedDumper.add_representer(OrderedDict, _dict_representer)
    return yaml.dump(data, stream, OrderedDumper, **kwds)

# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)

Update: In python 3.6+ you probably don’t need OrderedDict at all due to the new dict implementation that has been in use in pypy for some time (although considered CPython implementation detail for now).

Update: In python 3.7+, the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec, see What’s New In Python 3.7.

I like @James’ solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn’t directly work with yaml.safe_load().

Fortunately, the solution can be improved without much effort:

import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

# usage example:
ordered_load(stream, yaml.SafeLoader)

For serialization, I don’t know an obvious generalization, but at least this shouldn’t have any side effects:

def ordered_dump(data, stream=None, Dumper=yaml.Dumper, **kwds):
    class OrderedDumper(Dumper):
        pass
    def _dict_representer(dumper, data):
        return dumper.represent_mapping(
            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
            data.items())
    OrderedDumper.add_representer(OrderedDict, _dict_representer)
    return yaml.dump(data, stream, OrderedDumper, **kwds)

# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)

回答 1

yaml模块允许您指定自定义“表示器”以将Python对象转换为文本,并指定“构造器”以逆转该过程。

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG

def dict_representer(dumper, data):
    return dumper.represent_dict(data.iteritems())

def dict_constructor(loader, node):
    return collections.OrderedDict(loader.construct_pairs(node))

yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)

The yaml module allow you to specify custom ‘representers’ to convert Python objects to text and ‘constructors’ to reverse the process.

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG

def dict_representer(dumper, data):
    return dumper.represent_dict(data.iteritems())

def dict_constructor(loader, node):
    return collections.OrderedDict(loader.construct_pairs(node))

yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)

回答 2

2018年选项:

oyamlPyYAML的直接替代品,保留了字典排序。同时支持Python 2和Python 3。只需pip install oyaml导入,如下所示:

import oyaml as yaml

在转储/加载时,您将不再为搞砸的映射而烦恼。

注意:我是oyaml的作者。

2018 option:

oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:

import oyaml as yaml

You’ll no longer be annoyed by screwed-up mappings when dumping/loading.

Note: I’m the author of oyaml.


回答 3

2015(及更高版本)选项:

ruamel.yaml是PyYAML的替代品(免责声明:我是该软件包的作者)。保留映射的顺序是2015年在第一版(0.1)中添加的内容之一。它不仅保留字典的顺序,还保留注释,锚点名称,标签并支持YAML 1.2规范(2009年发布)

规范说,不能保证排序,但是YAML文件中当然有排序,并且适当的解析器可以仅保留该排序器,并透明地生成一个保持排序的对象。您只需要选择正确的解析器,加载器和转储器¹:

import sys
from ruamel.yaml import YAML

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)

会给你:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

dataCommentedMap类似dict 的类型,但具有额外的信息,这些信息会一直保留直到被转储(包括保留的注释!)

2015 (and later) option:

ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)

The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:

import sys
from ruamel.yaml import YAML

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)

will give you:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)


回答 4

注意:有一个基于以下答案的库,该库还实现了CLoader和CDumpers:Phynix / yamlloader

我非常怀疑这是最好的方法,但这是我想出的方法,并且确实有效。也可作为要点

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

回答 5

更新:不赞成使用该库,而推荐使用yamlloader(它基于yamlordereddictloader)

我刚刚找到了一个Python库(https://pypi.python.org/pypi/yamlordereddictloader/0.1.1),该库是基于此问题的答案而创建的,使用起来非常简单:

import yaml
import yamlordereddictloader

datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)

Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)

I’ve just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:

import yaml
import yamlordereddictloader

datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)

回答 6

在针对Python 2.7的For PyYaml安装中,我更新了__init __。py,constructor.py和loader.py。现在支持用于加载命令的object_pairs_hook选项。我所做的更改差异如下。

__init__.py

$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)

constructor.py

$ diff constructor.py Original
20,21c20
<     def __init__(self, object_pairs_hook=dict):
<         self.object_pairs_hook = object_pairs_hook
---
>     def __init__(self):
27,29d25
<     def create_object_hook(self):
<         return self.object_pairs_hook()
<
54,55c50,51
<         self.constructed_objects = self.create_object_hook()
<         self.recursive_objects = self.create_object_hook()
---
>         self.constructed_objects = {}
>         self.recursive_objects = {}
129c125
<         mapping = self.create_object_hook()
---
>         mapping = {}
400c396
<         data = self.create_object_hook()
---
>         data = {}
595c591
<             dictitems = self.create_object_hook()
---
>             dictitems = {}
602c598
<             dictitems = value.get('dictitems', self.create_object_hook())
---
>             dictitems = value.get('dictitems', {})

loader.py

$ diff loader.py Original
13c13
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
18c18
<         BaseConstructor.__init__(self, **constructKwds)
---
>         BaseConstructor.__init__(self)
23c23
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
28c28
<         SafeConstructor.__init__(self, **constructKwds)
---
>         SafeConstructor.__init__(self)
33c33
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
38c38
<         Constructor.__init__(self, **constructKwds)
---
>         Constructor.__init__(self)

On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.

__init__.py

$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)

constructor.py

$ diff constructor.py Original
20,21c20
<     def __init__(self, object_pairs_hook=dict):
<         self.object_pairs_hook = object_pairs_hook
---
>     def __init__(self):
27,29d25
<     def create_object_hook(self):
<         return self.object_pairs_hook()
<
54,55c50,51
<         self.constructed_objects = self.create_object_hook()
<         self.recursive_objects = self.create_object_hook()
---
>         self.constructed_objects = {}
>         self.recursive_objects = {}
129c125
<         mapping = self.create_object_hook()
---
>         mapping = {}
400c396
<         data = self.create_object_hook()
---
>         data = {}
595c591
<             dictitems = self.create_object_hook()
---
>             dictitems = {}
602c598
<             dictitems = value.get('dictitems', self.create_object_hook())
---
>             dictitems = value.get('dictitems', {})

loader.py

$ diff loader.py Original
13c13
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
18c18
<         BaseConstructor.__init__(self, **constructKwds)
---
>         BaseConstructor.__init__(self)
23c23
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
28c28
<         SafeConstructor.__init__(self, **constructKwds)
---
>         SafeConstructor.__init__(self)
33c33
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
38c38
<         Constructor.__init__(self, **constructKwds)
---
>         Constructor.__init__(self)

回答 7

这是一个简单的解决方案,还可以检查地图中是否有重复的顶级键。

import yaml
import re
from collections import OrderedDict

def yaml_load_od(fname):
    "load a yaml file as an OrderedDict"
    # detects any duped keys (fail on this) and preserves order of top level keys
    with open(fname, 'r') as f:
        lines = open(fname, "r").read().splitlines()
        top_keys = []
        duped_keys = []
        for line in lines:
            m = re.search(r'^([A-Za-z0-9_]+) *:', line)
            if m:
                if m.group(1) in top_keys:
                    duped_keys.append(m.group(1))
                else:
                    top_keys.append(m.group(1))
        if duped_keys:
            raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
    # 2nd pass to set up the OrderedDict
    with open(fname, 'r') as f:
        d_tmp = yaml.load(f)
    return OrderedDict([(key, d_tmp[key]) for key in top_keys])

here’s a simple solution that also checks for duplicated top level keys in your map.

import yaml
import re
from collections import OrderedDict

def yaml_load_od(fname):
    "load a yaml file as an OrderedDict"
    # detects any duped keys (fail on this) and preserves order of top level keys
    with open(fname, 'r') as f:
        lines = open(fname, "r").read().splitlines()
        top_keys = []
        duped_keys = []
        for line in lines:
            m = re.search(r'^([A-Za-z0-9_]+) *:', line)
            if m:
                if m.group(1) in top_keys:
                    duped_keys.append(m.group(1))
                else:
                    top_keys.append(m.group(1))
        if duped_keys:
            raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
    # 2nd pass to set up the OrderedDict
    with open(fname, 'r') as f:
        d_tmp = yaml.load(f)
    return OrderedDict([(key, d_tmp[key]) for key in top_keys])

如何在Mac OS X上为Python 3安装pip?

问题:如何在Mac OS X上为Python 3安装pip?

OS X(Mavericks)已安装Python 2.7库存。但是我用3.3来做我自己的所有Python个人资料。我刚刚冲洗了3.3.2安装并安装了新的3.3.3。所以我需要pyserial再次安装。我可以按照以前做过的方式来做,即:

  1. 从pypi下载pyserial
  2. 解压pyserial.tgz
  3. cd pyserial
  4. python3 setup.py install

但是我想像酷孩子一样做,并且做类似的事情pip3 install pyserial。但目前尚不清楚我如何达到目标。就这一点。对virtualenv不感兴趣(除非必须如此)。

OS X (Mavericks) has Python 2.7 stock installed. But I do all my own personal Python stuff with 3.3. I just flushed my 3.3.2 install and installed the new 3.3.3. So I need to install pyserial again. I can do it the way I’ve done it before, which is:

  1. Download pyserial from pypi
  2. untar pyserial.tgz
  3. cd pyserial
  4. python3 setup.py install

But I’d like to do like the cool kids do, and just do something like pip3 install pyserial. But it’s not clear how I get to that point. And just that point. Not interested (unless I have to be) in virtualenv yet.


回答 0

更新:Python3.4不再需要此功能。它会在库存安装中安装pip3。

我最终在python邮件列表上发布了相同的问题,并得到以下答案:

# download and install setuptools
curl -O https://bootstrap.pypa.io/ez_setup.py
python3 ez_setup.py
# download and install pip
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py

完美解决了我的问题。在为我自己添加以下内容之后:

cd /usr/local/bin
ln -s ../../../Library/Frameworks/Python.framework/Versions/3.3/bin/pip pip

为了能够直接运行pip,我能够:

# use pip to install
pip install pyserial

要么:

# Don't want it?
pip uninstall pyserial

UPDATE: This is no longer necessary with Python3.4. It installs pip3 as part of the stock install.

I ended up posting this same question on the python mailing list, and got the following answer:

# download and install setuptools
curl -O https://bootstrap.pypa.io/ez_setup.py
python3 ez_setup.py
# download and install pip
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py

Which solved my question perfectly. After adding the following for my own:

cd /usr/local/bin
ln -s ../../../Library/Frameworks/Python.framework/Versions/3.3/bin/pip pip

So that I could run pip directly, I was able to:

# use pip to install
pip install pyserial

or:

# Don't want it?
pip uninstall pyserial

回答 1

我必须自己经历这个过程,并选择一种从长远来看更好的方法。

我安装了自制软件

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

然后:

brew doctor

最后一步为您提供一些必须解决的警告和错误。其中之一将是下载并安装Mac OS X命令行工具

然后:

brew install python3

这给了我,python3并且走pip3了我的路。

pieter$ which pip3 python3
/usr/local/bin/pip3
/usr/local/bin/python3

I had to go through this process myself and chose a different way that I think is better in the long run.

I installed homebrew

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

then:

brew doctor

The last step gives you some warnings and errors that you have to resolve. One of those will be to download and install the Mac OS X command-line tools.

then:

brew install python3

This gave me python3 and pip3 in my path.

pieter$ which pip3 python3
/usr/local/bin/pip3
/usr/local/bin/python3

回答 2

在Mac上安装Python3

1. brew install python3
2. curl https://bootstrap.pypa.io/get-pip.py | python3
3. python3

使用pip3安装模块

1. pip3 install ipython
2. python3 -m IPython

:)

Install Python3 on mac

1. brew install python3
2. curl https://bootstrap.pypa.io/get-pip.py | python3
3. python3

Use pip3 to install modules

1. pip3 install ipython
2. python3 -m IPython

:)


回答 3

另外:当您使用python3安装请求时,命令为:

pip3 install requests

pip install requests

Plus: when you install requests with python3, the command is:

pip3 install requests

not

pip install requests

回答 4

  1. brew install python3
  2. 在您的外壳配置文件中创建别名

    • 例如。alias pip3="python3 -m pip"在我的.zshrc

➜〜pip3-版本

来自/usr/local/lib/python3.6/site-packages(python 3.6)的pip 9.0.1

  1. brew install python3
  2. create alias in your shell profile

    • eg. alias pip3="python3 -m pip" in my .zshrc

➜ ~ pip3 –version

pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)


回答 5

这是我的简单解决方案:

如果您的系统中同时安装了python2和python3,则默认情况下pip升级将指向python2。因此,我们必须指定python(python3)的版本并使用以下命令:

python3 -m pip install --upgrade pip

此命令将卸载以前安装的pip并安装新版本-升级pip。

这将节省内存并使系统混乱。

图像-在MacOS上如何在Python3中升级pip

Here is my simple solution:

If you have python2 and python3 both installed in your system, the pip upgrade will point to python2 by default. Hence, we must specify the version of python(python3) and use the below command:

python3 -m pip install --upgrade pip

This command will uninstall the previously installed pip and install the new version- upgrading your pip.

This will save memory and declutter your system.

Image – How the upgrading of pip in Python3 works on MacOS


回答 6

要使用Python EasyInstall(我想您要使用它)非常简单!

sudo easy_install pip

因此,然后使用pip安装Pyserial,您可以执行以下操作:

pip install pyserial

To use Python EasyInstall (which is what I think you’re wanting to use), is super easy!

sudo easy_install pip

so then with pip to install Pyserial you would do:

pip install pyserial

回答 7

另外,值得一提的是Max OSX / macOS用户可以使用 Homebrew安装pip3。

$> brew update
$> brew install python3
$> pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

Also, it’s worth to mention that Max OSX/macOS users can just use Homebrew to install pip3.

$> brew update
$> brew install python3
$> pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

回答 8

Mac OS X Mojave python代表的2.7版本,Python和python3为版本3.同样是Pythonpippip3。所以,要升级pippython 3做到这一点:

~$ sudo pip3 install --upgrade pip

On Mac OS X Mojave python stands for python of version 2.7 and python3 for python of version 3. The same is pip and pip3. So, to upgrade pip for python 3 do this:

~$ sudo pip3 install --upgrade pip

回答 9

在MacOS 10.12上

下载点数:pip asget-pip.py

下载python3:python3

  1. 安装python3
  2. 打开终端: python3 get-pip.py
  3. pip3 可用

On MacOS 10.12

download pip: pip as get-pip.py

download python3: python3

  1. install python3
  2. open terminal: python3 get-pip.py
  3. pip3 is available

回答 10

pip 使用brew通过python2自动安装:

  1. brew install python3
  2. pip3 --version

pip is installed automatically with python2 using brew:

  1. brew install python3
  2. pip3 --version

回答 11

如果您的Mac上未安装pip,则只需在终端上运行以下命令即可。

sudo easy_install pip

在此处下载python 3: python3

完成这两个步骤后,请确保运行以下命令以验证是否已成功安装它们。

python3 --version
pip3 --version

simply run following on terminal if you don’t have pip installed on your mac.

sudo easy_install pip

download python 3 here: python3

once you’re done with these 2 steps, make sure to run the following to verify whether you’ve installed them successfully.

python3 --version
pip3 --version

回答 12

对于全新的Mac,您需要执行以下步骤:-

  1. 确保已安装 Xcode
  2. sudo easy_install pip
  3. /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  4. brew doctor
  5. brew doctor
  6. brew install python3

完成后,只需python3在终端上键入,就会看到安装了python 3。

For a fresh new Mac, you need to follow below steps:-

  1. Make sure you have installed Xcode
  2. sudo easy_install pip
  3. /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  4. brew doctor
  5. brew doctor
  6. brew install python3

And you are done, just type python3 on terminal and you will see python 3 installed.


回答 13

我在python3和pip3中遇到了同样的问题。决策:使用链接和其他东西解决所有冲突

brew doctor

之后

brew reinstall python3

I had the same problem with python3 and pip3. Decision: solving all conflicts with links and other stuff when do

brew doctor

After that

brew reinstall python3

比“无法解码JSON对象”显示更好的错误消息

问题:比“无法解码JSON对象”显示更好的错误消息

Python代码可从一些冗长而复杂的JSON文件加载数据:

with open(filename, "r") as f:
  data = json.loads(f.read())

(注意:最佳代码版本应为:

with open(filename, "r") as f:
  data = json.load(f)

但两者都表现出相似的行为)

对于许多类型的JSON错误(缺少分隔符,字符串中不正确的反斜杠等),这会打印出一条非常有用的消息,其中包含找到JSON错误的行号和列号。

但是,对于其他类型的JSON错误(包括经典的“在列表中的最后一项上使用逗号”,以及其他诸如大写true / false的大写字母),Python的输出仅为:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

对于这种类型的ValueError,如何让Python告诉您JSON文件中的错误在哪里?

Python code to load data from some long complicated JSON file:

with open(filename, "r") as f:
  data = json.loads(f.read())

(note: the best code version should be:

with open(filename, "r") as f:
  data = json.load(f)

but both exhibit similar behavior)

For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.

However, for other types of JSON error (including the classic “using comma on the last item in a list”, but also other things like capitalising true/false), Python’s output is just:

Traceback (most recent call last):
  File "myfile.py", line 8, in myfunction
    config = json.loads(f.read())
  File "c:\python27\lib\json\__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "c:\python27\lib\json\decoder.py", line 360, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?


回答 0

我发现,在simplejson内置json模块含糊不清的许多情况下,该模块会给出更多的描述性错误。例如,对于列表中最后一项之后的逗号:

json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded

这不是很描述。与以下操作相同simplejson

simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)

好多了!同样适用于其他常见错误,例如大写True

I’ve found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:

json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded

which is not very descriptive. The same operation with simplejson:

simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)

Much better! Likewise for other common errors like capitalizing True.


回答 1

您将无法获得python来告诉您JSON不正确的地方。您将需要在这样的地方在线使用棉绒

这将向您显示您尝试解码的JSON错误。

You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this

This will show you error in the JSON you are trying to decode.


回答 2

您可以尝试在以下位置找到rson库:http : //code.google.com/p/rson/。我还在PYPI上:https ://pypi.python.org/pypi/rson/0.9,所以您可以使用easy_install或pip来获取它。

对于tom给出的示例:

>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'

RSON被设计为JSON的超集,因此它可以解析JSON文件。它还具有一种替代语法,对于人类来说,查看和编辑它好得多。我在输入文件中使用了很多。

至于布尔值的大写:rson似乎错误地将大写的布尔值读取为字符串。

>>> rson.loads('[true,False]')
[True, u'False']

You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.

for the example given by tom:

>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'

RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.

As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.

>>> rson.loads('[true,False]')
[True, u'False']

回答 3

我有一个类似的问题,这是由于单引号引起的。JSON标准(http://json.org)仅讨论使用双引号,因此必须是python json库仅支持双引号。

I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.


回答 4

对于这个问题的特定版本,我继续搜索load_json_file(path)packaging.py文件中的函数声明,然后在其中走私了print一行:

def load_json_file(path):
    data = open(path, 'r').read()
    print data
    try:
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
                                                               path))

这样,它将在进入try-catch之前打印json文件的内容,并且即使我几乎不具备Python知识,我也能够迅速弄清楚为什么我的配置无法读取json文件。
(这是因为我设置了文本编辑器来编写UTF-8 BOM …愚蠢)

仅仅提及这一点是因为,虽然可能不能很好地解决OP的特定问题,但这是一种确定非常令人讨厌的bug的来源的相当快捷的方法。我敢打赌,很多人会偶然发现这篇文章,他们正在寻找更详细的解决方案MalformedJsonFileError: No JSON object could be decoded when reading …。这样可能对他们有帮助。

For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:

def load_json_file(path):
    data = open(path, 'r').read()
    print data
    try:
        return Bunch(json.loads(data))
    except ValueError, e:
        raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
                                                               path))

That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn’t read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)

Just mentioning this because, while maybe not a good answer to the OP’s specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.


回答 5

对我来说,我的json文件很大,json在python中使用common 时会出现上述错误。

安装后simplejson通过sudo pip install simplejson

然后我解决了。

import json
import simplejson


def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]


if __name__ == '__main__':
    test_parse_json()

As to me, my json file is very large, when use common json in python it gets the above error.

After install simplejson by sudo pip install simplejson.

And then I solved it.

import json
import simplejson


def test_parse_json():
    f_path = '/home/hello/_data.json'
    with open(f_path) as f:
        # j_data = json.load(f)      # ValueError: No JSON object could be decoded
        j_data = simplejson.load(f)  # right
    lst_img = j_data['images']['image']
    print lst_img[0]


if __name__ == '__main__':
    test_parse_json()

回答 6

我有一个类似的问题,这是我的代码:

    json_file=json.dumps(pyJson)
    file = open("list.json",'w')
    file.write(json_file)  

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

问题是我忘了file.close() 做到这一点并解决了问题。

I had a similar problem this was my code:

    json_file=json.dumps(pyJson)
    file = open("list.json",'w')
    file.write(json_file)  

    json_file = open("list.json","r")
    json_decoded = json.load(json_file)
    print json_decoded

the problem was i had forgotten to file.close() I did it and fixed the problem.


回答 7

可接受的答案是解决问题的最简单方法。但是,如果由于公司政策而不允许您安装simplejson,我建议采用以下解决方案来解决“在列表中的最后一项上使用逗号”这一特定问题:

  1. 创建一个子类“ JSONLintCheck”以从类“ JSONDecoder”继承,并覆盖类“ JSONDecoder”的init方法,如下所示:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
  1. make_scanner是一个新函数,用于覆盖上述类的’scan_once’方法。这是它的代码:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  5
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
  9
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 22
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 29
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 43
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 62
 63 make_scanner = py_make_scanner
  1. 最好将“ make_scanner”功能与新的子类一起放入同一文件中。

The accepted answer is the easiest one to fix the problem. But in case you are not allowed to install the simplejson due to your company policy, I propose below solution to fix the particular issue of “using comma on the last item in a list”:

  1. Create a child class “JSONLintCheck” to inherite from class “JSONDecoder” and override the init method of the class “JSONDecoder” like below:

    def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)        
            super(JSONLintCheck,self).__init__(encoding=None, object_hook=None,      parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
            self.scan_once = make_scanner(self)
    
  1. make_scanner is a new function that used to override the ‘scan_once’ method of the above class. And here is code for it:
  1 #!/usr/bin/env python
  2 from json import JSONDecoder
  3 from json import decoder
  4 import re
  5
  6 NUMBER_RE = re.compile(
  7     r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
  8     (re.VERBOSE | re.MULTILINE | re.DOTALL))
  9
 10 def py_make_scanner(context):
 11     parse_object = context.parse_object
 12     parse_array = context.parse_array
 13     parse_string = context.parse_string
 14     match_number = NUMBER_RE.match
 15     encoding = context.encoding
 16     strict = context.strict
 17     parse_float = context.parse_float
 18     parse_int = context.parse_int
 19     parse_constant = context.parse_constant
 20     object_hook = context.object_hook
 21     object_pairs_hook = context.object_pairs_hook
 22
 23     def _scan_once(string, idx):
 24         try:
 25             nextchar = string[idx]
 26         except IndexError:
 27             raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
 28             #raise StopIteration
 29
 30         if nextchar == '"':
 31             return parse_string(string, idx + 1, encoding, strict)
 32         elif nextchar == '{':
 33             return parse_object((string, idx + 1), encoding, strict,
 34                 _scan_once, object_hook, object_pairs_hook)
 35         elif nextchar == '[':
 36             return parse_array((string, idx + 1), _scan_once)
 37         elif nextchar == 'n' and string[idx:idx + 4] == 'null':
 38             return None, idx + 4
 39         elif nextchar == 't' and string[idx:idx + 4] == 'true':
 40             return True, idx + 4
 41         elif nextchar == 'f' and string[idx:idx + 5] == 'false':
 42             return False, idx + 5
 43
 44         m = match_number(string, idx)
 45         if m is not None:
 46             integer, frac, exp = m.groups()
 47             if frac or exp:
 48                 res = parse_float(integer + (frac or '') + (exp or ''))
 49             else:
 50                 res = parse_int(integer)
 51             return res, m.end()
 52         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
 53             return parse_constant('NaN'), idx + 3
 54         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
 55             return parse_constant('Infinity'), idx + 8
 56         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
 57             return parse_constant('-Infinity'), idx + 9
 58         else:
 59             #raise StopIteration   # Here is where needs modification
 60             raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
 61     return _scan_once
 62
 63 make_scanner = py_make_scanner
  1. Better put the ‘make_scanner’ function together with the new child class into a same file.

回答 8

只是遇到了同样的问题,在我的情况下,问题与BOM文件开头的(字节顺序标记)有关。

json.tool 直到我删除了UTF BOM标记,都拒绝处理甚至是空文件(只是花括号)。

我所做的是:

  • 用vim打开我的json文件,
  • 删除了字节顺序标记(set nobomb
  • 保存存档

这就解决了json.tool的问题。希望这可以帮助!

Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.

json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.

What I have done is:

  • opened my json file with vim,
  • removed byte order mark (set nobomb)
  • save file

This resolved the problem with json.tool. Hope this helps!


回答 9

创建文件时。而不是创建内容为空的文件。用。。。来代替:

json.dump({}, file)

When your file is created. Instead of creating a file with content is empty. Replace with:

json.dump({}, file)

回答 10

您可以使用cjson,它声称比纯python实现快250倍,因为您有“某个较长且复杂的JSON文件”,并且可能需要运行几次(解码器失败并报告第一个错误)仅遇到)。

You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have “some long complicated JSON file” and you will probably need to run it several times (decoders fail and report the first error they encounter only).


如何从.py文件手动生成.pyc文件

问题:如何从.py文件手动生成.pyc文件

由于某种原因,我不能依靠Python的“ import”语句来自动生成.pyc文件

有没有一种方法可以实现以下功能?

def py_to_pyc(py_filepath, pyc_filepath):
    ...

For some reason, I can not depend on Python’s “import” statement to generate .pyc file automatically

Is there a way to implement a function as following?

def py_to_pyc(py_filepath, pyc_filepath):
    ...

回答 0

您可以compileall在终端中使用。以下命令将递归进入子目录,并为找到的所有python文件创建pyc文件。该compileall模块是Python标准库的一部分,所以你不需要安装任何额外的使用它。这对于python2和python3完全相同。

python -m compileall .

You can use compileall in the terminal. The following command will go recursively into sub directories and make pyc files for all the python files it finds. The compileall module is part of the python standard library, so you don’t need to install anything extra to use it. This works exactly the same way for python2 and python3.

python -m compileall .

回答 1

您可以使用以下命令从命令行编译单个文件:

python -m compileall <file_1>.py <file_n>.py

You can compile individual files(s) from the command line with:

python -m compileall <file_1>.py <file_n>.py

回答 2

自从我上一次使用Python已经有一段时间了,但是我相信您可以使用py_compile

import py_compile
py_compile.compile("file.py")

It’s been a while since I last used Python, but I believe you can use py_compile:

import py_compile
py_compile.compile("file.py")

回答 3

我发现几种将python脚本编译成字节码的方法

  1. py_compile在终端中使用:

    python -m py_compile File1.py File2.py File3.py ...

    -m 指定要编译的模块名称。

    或者,用于文件的交互式编译

    python -m py_compile -
    File1.py
    File2.py
    File3.py
       .
       .
       .
  2. 使用py_compile.compile

    import py_compile
    py_compile.compile('YourFileName.py')
  3. 使用py_compile.main()

    它一次编译几个文件。

    import py_compile
    py_compile.main(['File1.py','File2.py','File3.py'])

    该列表可以根据需要增加。或者,您显然可以在命令行args中传递主要甚至文件名中的文件列表。

    或者,如果您传入['-']main,则它可以交互式地编译文件。

  4. 使用compileall.compile_dir()

    import compileall
    compileall.compile_dir(direname)

    它编译提供的目录中存在的每个Python文件。

  5. 使用compileall.compile_file()

    import compileall
    compileall.compile_file('YourFileName.py')

看一下下面的链接:

https://docs.python.org/3/library/py_compile.html

https://docs.python.org/3/library/compileall.html

I found several ways to compile python scripts into bytecode

  1. Using py_compile in terminal:

    python -m py_compile File1.py File2.py File3.py ...
    

    -m specifies the module(s) name to be compiled.

    Or, for interactive compilation of files

    python -m py_compile -
    File1.py
    File2.py
    File3.py
       .
       .
       .
    
  2. Using py_compile.compile:

    import py_compile
    py_compile.compile('YourFileName.py')
    
  3. Using py_compile.main():

    It compiles several files at a time.

    import py_compile
    py_compile.main(['File1.py','File2.py','File3.py'])
    

    The list can grow as long as you wish. Alternatively, you can obviously pass a list of files in main or even file names in command line args.

    Or, if you pass ['-'] in main then it can compile files interactively.

  4. Using compileall.compile_dir():

    import compileall
    compileall.compile_dir(direname)
    

    It compiles every single Python file present in the supplied directory.

  5. Using compileall.compile_file():

    import compileall
    compileall.compile_file('YourFileName.py')
    

Take a look at the links below:

https://docs.python.org/3/library/py_compile.html

https://docs.python.org/3/library/compileall.html


回答 4

我会用compileall。从脚本和命令行都可以很好地工作。它比已经在内部使用的py_compile更高级别的模块/工具。

I would use compileall. It works nicely both from scripts and from the command line. It’s a bit higher level module/tool than the already mentioned py_compile that it also uses internally.


回答 5

在Python2中,您可以使用:

python -m compileall <pythonic-project-name>

这样就可以全部编译.py.pyc包含子文件夹的项目中。


在Python3中,您可以使用:

python3 -m compileall <pythonic-project-name>

这样就可以全部编译.py__pycache__项目中包含子文件夹的文件夹中。

或从这篇文章中褐变:

您可以.pyc使用以下命令在文件夹中强制执行与Python2中相同的文件布局:

python3 -m compileall -b <pythonic-project-name>

该选项-b触发.pyc文件输出到其旧位置(即与Python2中相同)。

In Python2 you could use:

python -m compileall <pythonic-project-name>

which compiles all .py files to .pyc files in a project which contains packages as well as modules.


In Python3 you could use:

python3 -m compileall <pythonic-project-name>

which compiles all .py files to __pycache__ folders in a project which contains packages as well as modules.

Or with browning from this post:

You can enforce the same layout of .pyc files in the folders as in Python2 by using:

python3 -m compileall -b <pythonic-project-name>

The option -b triggers the output of .pyc files to their legacy-locations (i.e. the same as in Python2).


回答 6

为了匹配原始问题要求(源路径和目标路径),代码应如下所示:

import py_compile
py_compile.compile(py_filepath, pyc_filepath)

如果输入代码有错误,则会引发py_compile.PyCompileError异常。

To match the original question requirements (source path and destination path) the code should be like that:

import py_compile
py_compile.compile(py_filepath, pyc_filepath)

If the input code has errors then the py_compile.PyCompileError exception is raised.


回答 7

有两种方法可以做到这一点

  1. 命令行
  2. 使用python程序

如果使用命令行,则将python -m compileall <argument>python代码编译为python二进制代码。例如:python -m compileall -x ./*

或者, 您可以使用此代码将您的库编译为字节代码。

import compileall
import os

lib_path = "your_lib_path"
build_path = "your-dest_path"

compileall.compile_dir(lib_path, force=True, legacy=True)

def compile(cu_path):
    for file in os.listdir(cu_path):
        if os.path.isdir(os.path.join(cu_path, file)):
            compile(os.path.join(cu_path, file))
        elif file.endswith(".pyc"):
            dest = os.path.join(build_path, cu_path ,file)
            os.makedirs(os.path.dirname(dest), exist_ok=True)
            os.rename(os.path.join(cu_path, file), dest)

compile(lib_path)

看看☞docs.python.org详细资料

There is two way to do this

  1. Command line
  2. Using python program

If you are using command line use python -m compileall <argument> to compile python code to python binary code. Ex: python -m compileall -x ./*

Or, You can use this code to compile your library into byte-code.

import compileall
import os

lib_path = "your_lib_path"
build_path = "your-dest_path"

compileall.compile_dir(lib_path, force=True, legacy=True)

def compile(cu_path):
    for file in os.listdir(cu_path):
        if os.path.isdir(os.path.join(cu_path, file)):
            compile(os.path.join(cu_path, file))
        elif file.endswith(".pyc"):
            dest = os.path.join(build_path, cu_path ,file)
            os.makedirs(os.path.dirname(dest), exist_ok=True)
            os.rename(os.path.join(cu_path, file), dest)

compile(lib_path)

look at ☞ docs.python.org for detailed documentation