Python 实用宝典

Question 1

I need to write the below data to yaml file using Python:

{A:a, B:{C:c, D:d, E:e}}

i.e., dictionary in a dictionary. How can I achieve this?

Question 2

import yaml

data = dict(
    A = 'a',
    B = dict(
        C = 'c',
        D = 'd',
        E = 'e',
    )
)

with open('data.yml', 'w') as outfile:
    yaml.dump(data, outfile, default_flow_style=False)

The default_flow_style=False parameter is necessary to produce the format you want (flow style), otherwise for nested collections it produces block style:

A: a
B: {C: c, D: d, E: e}

Question 3

Link to the PyYAML documentation showing the difference for the default_flow_style parameter. To write it to a file in block mode (often more readable):

d = {'A':'a', 'B':{'C':'c', 'D':'d', 'E':'e'}}
with open('result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file, default_flow_style=False)

produces:

A: a
B:
  C: c
  D: d
  E: e

Question 4

when my function f is called with a variable I want to check if var is a pandas dataframe:

def f(var):
    if var == pd.DataFrame():
        print "do stuff"

I guess the solution might be quite simple but even with

def f(var):
    if var.values != None:
        print "do stuff"

I can’t get it to work like expected.

Question 5

Use isinstance, nothing else:

if isinstance(x, pd.DataFrame):
    ... # do something

PEP8 says explicitly that isinstance is the preferred way to check types

No:  type(x) is pd.DataFrame
No:  type(x) == pd.DataFrame
Yes: isinstance(x, pd.DataFrame)

And don’t even think about

if obj.__class__.__name__ = 'DataFrame':
    expect_problems_some_day()

isinstance handles inheritance (see What are the differences between type() and isinstance()?). For example, it will tell you if a variable is a string (either str or unicode), because they derive from basestring)

if isinstance(obj, basestring):
    i_am_string(obj)

Specifically for pandas DataFrame objects:

import pandas as pd
isinstance(var, pd.DataFrame)

Question 6

Use the built-in isinstance() function.

import pandas as pd

def f(var):
    if isinstance(var, pd.DataFrame):
        print("do stuff")

Question 7

I use Scilab, and want to convert an array of booleans into an array of integers:

>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)

In Scilab I can use:

>>> bool2s(y)
0.    0.    1.    1.

or even just multiply it by 1:

>>> 1*y
0.    0.    1.    1.

Is there a simple command for this in Python, or would I have to use a loop?

Question 8

Numpy arrays have an astype method. Just do y.astype(int).

Note that it might not even be necessary to do this, depending on what you’re using the array for. Bool will be autopromoted to int in many cases, so you can add it to int arrays without having to explicitly convert it:

>>> x
array([ True, False,  True], dtype=bool)
>>> x + [1, 2, 3]
array([2, 2, 4])

Question 9

The 1*y method works in Numpy too:

>>> import numpy as np
>>> x = np.array([4, 3, 2, 1])
>>> y = 2 >= x
>>> y
array([False, False,  True,  True], dtype=bool)
>>> 1*y                      # Method 1
array([0, 0, 1, 1])
>>> y.astype(int)            # Method 2
array([0, 0, 1, 1])

If you are asking for a way to convert Python lists from Boolean to int, you can use map to do it:

>>> testList = [False, False,  True,  True]
>>> map(lambda x: 1 if x else 0, testList)
[0, 0, 1, 1]
>>> map(int, testList)
[0, 0, 1, 1]

Or using list comprehensions:

>>> testList
[False, False, True, True]
>>> [int(elem) for elem in testList]
[0, 0, 1, 1]

Question 10

Using numpy, you can do:

y = x.astype(int)

If you were using a non-numpy array, you could use a list comprehension:

y = [int(val) for val in x]

Question 11

Most of the time you don’t need conversion:

>>>array([True,True,False,False]) + array([1,2,3,4])
array([2, 3, 3, 4])

The right way to do it is:

yourArray.astype(int)

or

yourArray.astype(float)

Question 12

I know you asked for non-looping solutions, but the only solutions I can come up with probably loop internally anyway:

map(int,y)

or:

[i*1 for i in y]

or:

import numpy
y=numpy.array(y)
y*1

Question 13

A funny way to do this is

>>> np.array([True, False, False]) + 0 
np.array([1, 0, 0])

Question 14

Can I reset an iterator / generator in Python? I am using DictReader and would like to reset it to the beginning of the file.

Question 15

I see many answers suggesting itertools.tee, but that’s ignoring one crucial warning in the docs for it:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

Basically, tee is designed for those situation where two (or more) clones of one iterator, while “getting out of sync” with each other, don’t do so by much — rather, they say in the same “vicinity” (a few items behind or ahead of each other). Not suitable for the OP’s problem of “redo from the start”.

L = list(DictReader(...)) on the other hand is perfectly suitable, as long as the list of dicts can fit comfortably in memory. A new “iterator from the start” (very lightweight and low-overhead) can be made at any time with iter(L), and used in part or in whole without affecting new or existing ones; other access patterns are also easily available.

As several answers rightly remarked, in the specific case of csv you can also .seek(0) the underlying file object (a rather special case). I’m not sure that’s documented and guaranteed, though it does currently work; it would probably be worth considering only for truly huge csv files, in which the list I recommmend as the general approach would have too large a memory footprint.

Question 16

If you have a csv file named ‘blah.csv’ That looks like

a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6

you know that you can open the file for reading, and create a DictReader with

blah = open('blah.csv', 'r')
reader= csv.DictReader(blah)

Then, you will be able to get the next line with reader.next(), which should output

{'a':1,'b':2,'c':3,'d':4}

using it again will produce

{'a':2,'b':3,'c':4,'d':5}

However, at this point if you use blah.seek(0), the next time you call reader.next() you will get

{'a':1,'b':2,'c':3,'d':4}

again.

This seems to be the functionality you’re looking for. I’m sure there are some tricks associated with this approach that I’m not aware of however. @Brian suggested simply creating another DictReader. This won’t work if you’re first reader is half way through reading the file, as your new reader will have unexpected keys and values from wherever you are in the file.

Question 17

No. Python’s iterator protocol is very simple, and only provides one single method (.next() or __next__()), and no method to reset an iterator in general.

The common pattern is to instead create a new iterator using the same procedure again.

If you want to “save off” an iterator so that you can go back to its beginning, you may also fork the iterator by using itertools.tee

Question 18

Yes, if you use numpy.nditer to build your iterator.

>>> lst = [1,2,3,4,5]
>>> itr = numpy.nditer([lst])
>>> itr.next()
1
>>> itr.next()
2
>>> itr.finished
False
>>> itr.reset()
>>> itr.next()
1

Question 19

There’s a bug in using .seek(0) as advocated by Alex Martelli and Wilduck above, namely that the next call to .next() will give you a dictionary of your header row in the form of {key1:key1, key2:key2, ...}. The work around is to follow file.seek(0) with a call to reader.next() to get rid of the header row.

So your code would look something like this:

f_in = open('myfile.csv','r')
reader = csv.DictReader(f_in)

for record in reader:
    if some_condition:
        # reset reader to first row of data on 2nd line of file
        f_in.seek(0)
        reader.next()
        continue
    do_something(record)

Question 20

This is perhaps orthogonal to the original question, but one could wrap the iterator in a function that returns the iterator.

def get_iter():
    return iterator

To reset the iterator just call the function again. This is of course trivial if the function when the said function takes no arguments.

In the case that the function requires some arguments, use functools.partial to create a closure that can be passed instead of the original iterator.

def get_iter(arg1, arg2):
   return iterator
from functools import partial
iter_clos = partial(get_iter, a1, a2)

This seems to avoid the caching that tee (n copies) or list (1 copy) would need to do

Question 21

For small files, you may consider using more_itertools.seekable – a third-party tool that offers resetting iterables.

Demo

import csv

import more_itertools as mit


filename = "data/iris.csv"
with open(filename, "r") as f:
    reader = csv.DictReader(f)
    iterable = mit.seekable(reader)                    # 1
    print(next(iterable))                              # 2
    print(next(iterable))
    print(next(iterable))

    print("\nReset iterable\n--------------")
    iterable.seek(0)                                   # 3
    print(next(iterable))
    print(next(iterable))
    print(next(iterable))

Output

{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Reset iterable
--------------
{'Sepal width': '3.5', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '5.1', 'Species': 'Iris-setosa'}
{'Sepal width': '3', 'Petal width': '0.2', 'Petal length': '1.4', 'Sepal length': '4.9', 'Species': 'Iris-setosa'}
{'Sepal width': '3.2', 'Petal width': '0.2', 'Petal length': '1.3', 'Sepal length': '4.7', 'Species': 'Iris-setosa'}

Here a DictReader is wrapped in a seekable object (1) and advanced (2). The seek() method is used to reset/rewind the iterator to the 0th position (3).

Note: memory consumption grows with iteration, so be wary applying this tool to large files, as indicated in the docs.

Question 22

While there is no iterator reset, the “itertools” module from python 2.6 (and later) has some utilities that can help there. One of then is the “tee” which can make multiple copies of an iterator, and cache the results of the one running ahead, so that these results are used on the copies. I will seve your purposes:

>>> def printiter(n):
...   for i in xrange(n):
...     print "iterating value %d" % i
...     yield i

>>> from itertools import tee
>>> a, b = tee(printiter(5), 2)
>>> list(a)
iterating value 0
iterating value 1
iterating value 2
iterating value 3
iterating value 4
[0, 1, 2, 3, 4]
>>> list(b)
[0, 1, 2, 3, 4]

Question 23

For DictReader:

f = open(filename, "rb")
d = csv.DictReader(f, delimiter=",")

f.seek(0)
d.__init__(f, delimiter=",")

For DictWriter:

f = open(filename, "rb+")
d = csv.DictWriter(f, fieldnames=fields, delimiter=",")

f.seek(0)
f.truncate(0)
d.__init__(f, fieldnames=fields, delimiter=",")
d.writeheader()
f.flush()

Question 24

list(generator()) returns all remaining values for a generator and effectively resets it if it is not looped.

Question 25

Problem

I’ve had the same issue before. After analyzing my code, I realized that attempting to reset the iterator inside of loops slightly increases the time complexity and it also makes the code a bit ugly.

Solution

Open the file and save the rows to a variable in memory.

# initialize list of rows
rows = []

# open the file and temporarily name it as 'my_file'
with open('myfile.csv', 'rb') as my_file:

    # set up the reader using the opened file
    myfilereader = csv.DictReader(my_file)

    # loop through each row of the reader
    for row in myfilereader:
        # add the row to the list of rows
        rows.append(row)

Now you can loop through rows anywhere in your scope without dealing with an iterator.

Question 26

One possible option is to use itertools.cycle(), which will allow you to iterate indefinitely without any trick like .seek(0).

iterDic = itertools.cycle(csv.DictReader(open('file.csv')))

Question 27

I’m arriving at this same issue – while I like the tee() solution, I don’t know how big my files are going to be and the memory warnings about consuming one first before the other are putting me off adopting that method.

Instead, I’m creating a pair of iterators using iter() statements, and using the first for my initial run-through, before switching to the second one for the final run.

So, in the case of a dict-reader, if the reader is defined using:

d = csv.DictReader(f, delimiter=",")

I can create a pair of iterators from this “specification” – using:

d1, d2 = iter(d), iter(d)

I can then run my 1st-pass code against d1, safe in the knowledge that the second iterator d2 has been defined from the same root specification.

I’ve not tested this exhaustively, but it appears to work with dummy data.

Question 28

Only if the underlying type provides a mechanism for doing so (e.g. fp.seek(0)).

Question 29

Return a newly created iterator at the last iteration during the ‘iter()’ call

class ResetIter: 
  def __init__(self, num):
    self.num = num
    self.i = -1

  def __iter__(self):
    if self.i == self.num-1: # here, return the new object
      return self.__class__(self.num) 
    return self

  def __next__(self):
    if self.i == self.num-1:
      raise StopIteration

    if self.i <= self.num-1:
      self.i += 1
      return self.i


reset_iter = ResetRange(10)
for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')
print()

for i in reset_iter:
  print(i, end=' ')

Output:

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9

Question 30

I have a list of Pandas dataframes that I would like to combine into one Pandas dataframe. I am using Python 2.7.10 and Pandas 0.16.2

I created the list of dataframes from:

import pandas as pd
dfs = []
sqlall = "select * from mytable"

for chunk in pd.read_sql_query(sqlall , cnxn, chunksize=10000):
    dfs.append(chunk)

This returns a list of dataframes

type(dfs[0])
Out[6]: pandas.core.frame.DataFrame

type(dfs)
Out[7]: list

len(dfs)
Out[8]: 408

Here is some sample data

# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'two' : [19., 10., 11., 12.]})

# list of dataframes
mydfs = [d1, d2, d3]

I would like to combine d1, d2, and d3 into one pandas dataframe. Alternatively, a method of reading a large-ish table directly into a dataframe when using the chunksize option would be very helpful.

Question 31

Given that all the dataframes have the same columns, you can simply concat them:

import pandas as pd
df = pd.concat(list_of_dataframes)

Question 32

If the dataframes DO NOT all have the same columns try the following:

df = pd.DataFrame.from_dict(map(dict,df_list))

Question 33

You also can do it with functional programming:

from functools import reduce
reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)

Question 34

concat also works nicely with a list comprehension pulled using the “loc” command against an existing dataframe

df = pd.read_csv('./data.csv') # ie; Dataframe pulled from csv file with a "userID" column

review_ids = ['1','2','3'] # ie; ID values to grab from DataFrame

# Gets rows in df where IDs match in the userID column and combines them 

dfa = pd.concat([df.loc[df['userID'] == x] for x in review_ids])

Question 35

Should I be adding the Django migration files in the .gitignore file?

I’ve recently been getting a lot of git issues due to migration conflicts and was wondering if I should be marking migration files as ignore.

If so, how would I go about adding all of the migrations that I have in my apps, and adding them to the .gitignore file?

Question 36

Quoting from the Django migrations documentation:

The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.

If you follow this process, you shouldn’t be getting any merge conflicts in the migration files.

When merging version control branches, you still may encounter a situation where you have multiple migrations based on the same parent migration, e.g. if to different developers introduced a migration concurrently. One way of resolving this situation is to introduce a _merge_migration_. Often this can be done automatically with the command

./manage.py makemigrations --merge

which will introduce a new migration that depends on all current head migrations. Of course this only works when there is no conflict between the head migrations, in which case you will have to resolve the problem manually.

Given that some people here suggested that you shouldn’t commit your migrations to version control, I’d like to expand on the reasons why you actually should do so.

First, you need a record of the migrations applied to your production systems. If you deploy changes to production and want to migrate the database, you need a description of the current state. You can create a separate backup of the migrations applied to each production database, but this seems unnecessarily cumbersome.

Second, migrations often contain custom, handwritten code. It’s not always possible to automatically generate them with ./manage.py makemigrations.

Third, migrations should be included in code review. They are significant changes to your production system, and there are lots of things that can go wrong with them.

So in short, if you care about your production data, please check your migrations into version control.

Question 37

You can follow the below process.

You can run makemigrations locally and this creates the migration file. Commit this new migration file to repo.

In my opinion you should not run makemigrations in production at all. You can run migrate in production and you will see the migrations are applied from the migration file that you committed from local. This way you can avoid all conflicts.

IN LOCAL ENV, to create the migration files,

python manage.py makemigrations 
python manage.py migrate

Now commit these newly created files, something like below.

git add app/migrations/...
git commit -m 'add migration files' app/migrations/...

IN PRODUCTION ENV, run only the below command.

python manage.py migrate

Question 38

Quote from the 2018 docs, Django 2.0. (two separate commands = makemigrations and migrate)

The reason that there are separate commands to make and apply migrations is because you’ll commit migrations to your version control system and ship them with your app; they not only make your development easier, they’re also useable by other developers and in production.

https://docs.djangoproject.com/en/2.0/intro/tutorial02/

Question 39

TL;DR: commit migrations, resolve migration conflicts, adjust your git workflow.

Feels like you’d need to adjust your git workflow, instead of ignoring conflicts.

Ideally, every new feature is developed in a different branch, and merged back with a pull request.

PRs cannot be merged if there’s a conflict, therefore who needs to merge his feature needs to resolve the conflict, migrations included. This might need coordination between different teams.

It is important though to commit migration files! If a conflict arises, Django might even help you solve those conflicts ;)

Question 40

I can’t imagine why you would be getting conflicts, unless you’re editing the migrations somehow? That usually ends badly – if someone misses some intermediate commits then they won’t be upgrading from the correct version, and their copy of the database will be corrupted.

The process that I follow is pretty simple – whenever you change the models for an app, you also commit a migration, and then that migration doesn’t change – if you need something different in the model, then you change the model and commit a new migration alongside your changes.

In greenfield projects, you can often delete the migrations and start over from scratch with a 0001_ migration when you release, but if you have production code, then you can’t (though you can squash migrations down into one).

Question 41

The solution usually used, is that, before anything is merged into master, the developer must pull any remote changes. If there’s a conflict in migration versions, he should rename his local migration (the remote one has been run by other devs, and, potentially, in production), to N+1.

During development it might be okay to just not-commit migrations (don’t add an ignore though, just don’t add them). But once you’ve gone into production, you’ll need them in order to keep the schema in sync with model changes.

You then need to edit the file, and change the dependencies to the latest remote version.

This works for Django migrations, as well as other similar apps (sqlalchemy+alembic, RoR, etc).

Question 42

Having a bunch of migration files in git is messy. There is only one file in migration folder that you should not ignore. That file is init.py file, If you ignore it, python will no longer look for submodules inside the directory, so any attempts to import the modules will fail. So the question should be how to ignore all migration files but init.py? The solution is: Add ‘0*.py’ to .gitignore files and it does the job perfectly.

Hope this helps someone.

Question 43

Gitignore the migrations, if You have separate DBs for Development, Staging and Production environment. For dev. purposes You can use local sqlite DB and play with migrations locally. I would recommend You to create four additional branches:

Master – Clean fresh code without migrations. Nobody is connected to this branch. Used for code reviews only
Development – daily development. Push/pull accepted. Each developer is working on sqlite DB
Cloud_DEV_env – remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Dev database
Cloud_STAG_env – remote cloud/server STAG environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Stag database
Cloud_PROD_env – remote cloud/server DEV environment. Pull only. Keep migrations locally on machine, which is used for the code deployment and remote migrations of Prod database

Notes: 2, 3, 4 – migrations can be kept in repos but there should be strict rules of pull requests merging, so we decided to find a person, responsible for deployments, so the only guy who has all the migration files – our deploy-er. He keeps the remote DB migrations each time we have any changes in Models.

Question 44

Short answer I propose excluding migrations in the repo. After code merge, just run ./manage.py makemigrations and you are all set.

Long answer I don’t think you should put migrations files into repo. It will spoil the migration states in other person’s dev environment and other prod and stage environment. (refer to Sugar Tang’s comment for examples).

In my point of view, the purpose of Django migrations is to find gaps between previous model states and new model states, and then serialise the gap. If your model changes after code merge, you can simple do makemigrations to find out the gap. Why do you want to manually and carefully merge other migrations when you can achieve the same automatically and bug free? Django documentation says,

They*(migrations)*’re designed to be mostly automatic

; please keep it that way. To merge migrations manually, you have to fully understand what others have changed and any dependence of the changes. That’s a lot of overhead and error prone. So tracking models file is sufficient.

It is a good topic on the workflow. I am open to other options.

Question 45

I’d like to get PyYAML‘s loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.

What’s the best way to do that?

Question 46

Update: In python 3.6+ you probably don’t need OrderedDict at all due to the new dict implementation that has been in use in pypy for some time (although considered CPython implementation detail for now).

Update: In python 3.7+, the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec, see What’s New In Python 3.7.

I like @James’ solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn’t directly work with yaml.safe_load().

Fortunately, the solution can be improved without much effort:

import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

# usage example:
ordered_load(stream, yaml.SafeLoader)

For serialization, I don’t know an obvious generalization, but at least this shouldn’t have any side effects:

def ordered_dump(data, stream=None, Dumper=yaml.Dumper, **kwds):
    class OrderedDumper(Dumper):
        pass
    def _dict_representer(dumper, data):
        return dumper.represent_mapping(
            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
            data.items())
    OrderedDumper.add_representer(OrderedDict, _dict_representer)
    return yaml.dump(data, stream, OrderedDumper, **kwds)

# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)

Question 47

The yaml module allow you to specify custom ‘representers’ to convert Python objects to text and ‘constructors’ to reverse the process.

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG

def dict_representer(dumper, data):
    return dumper.represent_dict(data.iteritems())

def dict_constructor(loader, node):
    return collections.OrderedDict(loader.construct_pairs(node))

yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)

Question 48

2018 option:

oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:

import oyaml as yaml

You’ll no longer be annoyed by screwed-up mappings when dumping/loading.

_{Note: I’m the author of oyaml.}

Question 49

2015 (and later) option:

ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)

The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:

import sys
from ruamel.yaml import YAML

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)

will give you:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)

Question 50

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

Question 51

Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)

I’ve just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:

import yaml
import yamlordereddictloader

datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)

Question 52

On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.

__init__.py

$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)

constructor.py

$ diff constructor.py Original
20,21c20
<     def __init__(self, object_pairs_hook=dict):
<         self.object_pairs_hook = object_pairs_hook
---
>     def __init__(self):
27,29d25
<     def create_object_hook(self):
<         return self.object_pairs_hook()
<
54,55c50,51
<         self.constructed_objects = self.create_object_hook()
<         self.recursive_objects = self.create_object_hook()
---
>         self.constructed_objects = {}
>         self.recursive_objects = {}
129c125
<         mapping = self.create_object_hook()
---
>         mapping = {}
400c396
<         data = self.create_object_hook()
---
>         data = {}
595c591
<             dictitems = self.create_object_hook()
---
>             dictitems = {}
602c598
<             dictitems = value.get('dictitems', self.create_object_hook())
---
>             dictitems = value.get('dictitems', {})

loader.py

$ diff loader.py Original
13c13
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
18c18
<         BaseConstructor.__init__(self, **constructKwds)
---
>         BaseConstructor.__init__(self)
23c23
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
28c28
<         SafeConstructor.__init__(self, **constructKwds)
---
>         SafeConstructor.__init__(self)
33c33
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
38c38
<         Constructor.__init__(self, **constructKwds)
---
>         Constructor.__init__(self)

Question 53

here’s a simple solution that also checks for duplicated top level keys in your map.

import yaml
import re
from collections import OrderedDict

def yaml_load_od(fname):
    "load a yaml file as an OrderedDict"
    # detects any duped keys (fail on this) and preserves order of top level keys
    with open(fname, 'r') as f:
        lines = open(fname, "r").read().splitlines()
        top_keys = []
        duped_keys = []
        for line in lines:
            m = re.search(r'^([A-Za-z0-9_]+) *:', line)
            if m:
                if m.group(1) in top_keys:
                    duped_keys.append(m.group(1))
                else:
                    top_keys.append(m.group(1))
        if duped_keys:
            raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
    # 2nd pass to set up the OrderedDict
    with open(fname, 'r') as f:
        d_tmp = yaml.load(f)
    return OrderedDict([(key, d_tmp[key]) for key in top_keys])

Question 54

OS X (Mavericks) has Python 2.7 stock installed. But I do all my own personal Python stuff with 3.3. I just flushed my 3.3.2 install and installed the new 3.3.3. So I need to install pyserial again. I can do it the way I’ve done it before, which is:

Download pyserial from pypi
untar pyserial.tgz
cd pyserial
python3 setup.py install

But I’d like to do like the cool kids do, and just do something like pip3 install pyserial. But it’s not clear how I get to that point. And just that point. Not interested (unless I have to be) in virtualenv yet.

Question 55

UPDATE: This is no longer necessary with Python3.4. It installs pip3 as part of the stock install.

I ended up posting this same question on the python mailing list, and got the following answer:

# download and install setuptools
curl -O https://bootstrap.pypa.io/ez_setup.py
python3 ez_setup.py
# download and install pip
curl -O https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py

Which solved my question perfectly. After adding the following for my own:

cd /usr/local/bin
ln -s ../../../Library/Frameworks/Python.framework/Versions/3.3/bin/pip pip

So that I could run pip directly, I was able to:

# use pip to install
pip install pyserial

or:

# Don't want it?
pip uninstall pyserial

Question 56

I had to go through this process myself and chose a different way that I think is better in the long run.

I installed homebrew

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

then:

brew doctor

The last step gives you some warnings and errors that you have to resolve. One of those will be to download and install the Mac OS X command-line tools.

then:

brew install python3

This gave me python3 and pip3 in my path.

pieter$ which pip3 python3
/usr/local/bin/pip3
/usr/local/bin/python3

Question 57

Install Python3 on mac

1. brew install python3
2. curl https://bootstrap.pypa.io/get-pip.py | python3
3. python3

Use pip3 to install modules

1. pip3 install ipython
2. python3 -m IPython

:)

Question 58

Plus: when you install requests with python3, the command is:

pip3 install requests

not

pip install requests

Question 59

brew install python3
create alias in your shell profile
- eg. alias pip3="python3 -m pip" in my .zshrc

➜ ~ pip3 –version

pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

Question 60

Here is my simple solution:

If you have python2 and python3 both installed in your system, the pip upgrade will point to python2 by default. Hence, we must specify the version of python(python3) and use the below command:

python3 -m pip install --upgrade pip

This command will uninstall the previously installed pip and install the new version- upgrading your pip.

This will save memory and declutter your system.

Image – How the upgrading of pip in Python3 works on MacOS

Question 61

To use Python EasyInstall (which is what I think you’re wanting to use), is super easy!

sudo easy_install pip

so then with pip to install Pyserial you would do:

pip install pyserial

Question 62

Also, it’s worth to mention that Max OSX/macOS users can just use Homebrew to install pip3.

$> brew update
$> brew install python3
$> pip3 --version
pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)

Question 63

On Mac OS X Mojave python stands for python of version 2.7 and python3 for python of version 3. The same is pip and pip3. So, to upgrade pip for python 3 do this:

~$ sudo pip3 install --upgrade pip

Question 64

On MacOS 10.12

download pip: pip as get-pip.py

download python3: python3

install python3
open terminal: python3 get-pip.py
pip3 is available

问题：如何将YAML格式的数据写入文件？

回答 0

回答 1

问题：检查变量是否为数据框

回答 0

回答 1

问题：如何将布尔数组转换为int数组

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：可以在Python中重置迭代器吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题

解

Problem

Solution

回答 11

回答 12

回答 13

回答 14

问题：将熊猫数据框列表连接在一起

回答 0

回答 1

回答 2

回答 3

问题：我应该在.gitignore文件中添加Django迁移文件吗？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：在Python中，如何将YAML映射加载为OrderedDicts？

回答 0

回答 1

回答 2

2018年选项：

2018 option:

回答 3

2015（及更高版本）选项：

2015 (and later) option:

回答 4

回答 5

回答 6

回答 7

问题：如何在Mac OS X上为Python 3安装pip？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

问题：比“无法解码JSON对象”显示更好的错误消息

回答 0

回答 1

回答 2