Python 实用宝典

Question 1

I have a line of the following code (don’t blame for naming conventions, they are not mine):

subkeyword = Session.query(
    Subkeyword.subkeyword_id, Subkeyword.subkeyword_word
).filter_by(
    subkeyword_company_id=self.e_company_id
).filter_by(
    subkeyword_word=subkeyword_word
).filter_by(
    subkeyword_active=True
).one()

I don’t like how it looks like (not too readable) but I don’t have any better idea to limit lines to 79 characters in this situation. Is there a better way of breaking it (preferably without backslashes)?

Question 2

You could use additional parenthesis:

subkeyword = (
        Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
        .filter_by(subkeyword_company_id=self.e_company_id)
        .filter_by(subkeyword_word=subkeyword_word)
        .filter_by(subkeyword_active=True)
        .one()
    )

Question 3

This is a case where a line continuation character is preferred to open parentheses. The need for this style becomes more obvious as method names get longer and as methods start taking arguments:

subkeyword = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id)          \
                    .filter_by(subkeyword_word=subkeyword_word)                  \
                    .filter_by(subkeyword_active=True)                           \
                    .one()

PEP 8 is intend to be interpreted with a measure of common-sense and an eye for both the practical and the beautiful. Happily violate any PEP 8 guideline that results in ugly or hard to read code.

That being said, if you frequently find yourself at odds with PEP 8, it may be a sign that there are readability issues that transcend your choice of whitespace :-)

Question 4

My personal choice would be:

subkeyword = Session.query(
    Subkeyword.subkeyword_id,
    Subkeyword.subkeyword_word,
).filter_by(
    subkeyword_company_id=self.e_company_id,
    subkeyword_word=subkeyword_word,
    subkeyword_active=True,
).one()

Question 5

Just store the intermediate result/object and invoke the next method on it, e.g.

q = Session.query(Subkeyword.subkeyword_id, Subkeyword.subkeyword_word)
q = q.filter_by(subkeyword_company_id=self.e_company_id)
q = q.filter_by(subkeyword_word=subkeyword_word)
q = q.filter_by(subkeyword_active=True)
subkeyword = q.one()

Question 6

According to Python Language Reference
You can use a backslash.
Or simply break it. If a bracket is not paired, python will not treat that as a line. And under such circumstance, the indentation of following lines doesn’t matter.

Question 7

It’s a bit of a different solution than provided by others but a favorite of mine since it leads to nifty metaprogramming sometimes.

base = [Subkeyword.subkeyword_id, Subkeyword_word]
search = {
    'subkeyword_company_id':self.e_company_id,
    'subkeyword_word':subkeyword_word,
    'subkeyword_active':True,
    }
subkeyword = Session.query(*base).filter_by(**search).one()

This is a nice technique for building searches. Go through a list of conditionals to mine from your complex query form (or string-based deductions about what the user is looking for), then just explode the dictionary into the filter.

Question 8

You seems using SQLAlchemy, if it is true, sqlalchemy.orm.query.Query.filter_by() method takes multiple keyword arguments, so you could write like:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word) \
                    .filter_by(subkeyword_company_id=self.e_company_id,
                               subkeyword_word=subkeyword_word,
                               subkeyword_active=True) \
                    .one()

But it would be better:

subkeyword = Session.query(Subkeyword.subkeyword_id,
                           Subkeyword.subkeyword_word)
subkeyword = subkeyword.filter_by(subkeyword_company_id=self.e_company_id,
                                  subkeyword_word=subkeyword_word,
                                  subkeyword_active=True)
subkeuword = subkeyword.one()

Question 9

I like to indent the arguments by two blocks, and the statement by one block, like these:

for image_pathname in image_directory.iterdir():
    image = cv2.imread(str(image_pathname))
    input_image = np.resize(
            image, (height, width, 3)
        ).transpose((2,0,1)).reshape(1, 3, height, width)
    net.forward_all(data=input_image)
    segmentation_index = net.blobs[
            'argmax'
        ].data.squeeze().transpose(1,2,0).astype(np.uint8)
    segmentation = np.empty(segmentation_index.shape, dtype=np.uint8)
    cv2.LUT(segmentation_index, label_colours, segmentation)
    prediction_pathname = prediction_directory / image_pathname.name
    cv2.imwrite(str(prediction_pathname), segmentation)

Question 10

The Zen of Python states that there should only be one way to do things- yet frequently I run into the problem of deciding when to use a function versus when to use a method.

Let’s take a trivial example- a ChessBoard object. Let’s say we need some way to get all the legal King moves available on the board. Do we write ChessBoard.get_king_moves() or get_king_moves(chess_board)?

Here are some related questions I looked at:

The answers I got were largely inconclusive:

Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?

The major reason is history. Functions were used for those operations that were generic for a group of types and which were intended to work even for objects that didn’t have methods at all (e.g. tuples). It is also convenient to have a function that can readily be applied to an amorphous collection of objects when you use the functional features of Python (map(), apply() et al).

In fact, implementing len(), max(), min() as a built-in function is actually less code than implementing them as methods for each type. One can quibble about individual cases but it’s a part of Python, and it’s too late to make such fundamental changes now. The functions have to remain to avoid massive code breakage.

While interesting, the above doesn’t really say much as to what strategy to adopt.

This is one of the reasons – with custom methods, developers would be free to choose a different method name, like getLength(), length(), getlength() or whatsoever. Python enforces strict naming so that the common function len() can be used.

Slightly more interesting. My take is that functions are in a sense, the Pythonic version of interfaces.

Lastly, from Guido himself:

Talking about the Abilities/Interfaces made me think about some of our “rogue” special method names. In the Language Reference, it says, “A class can implement certain operations that are invoked by special syntax (such as arithmetic operations or subscripting and slicing) by defining methods with special names.” But there are all these methods with special names like __len__ or __unicode__ which seem to be provided for the benefit of built-in functions, rather than for support of syntax. Presumably in an interface-based Python, these methods would turn into regularly-named methods on an ABC, so that __len__ would become
class container:
  ...
  def len(self):
    raise NotImplemented
Though, thinking about it some more, I don’t see why all syntactic operations wouldn’t just invoke the appropriate normally-named method on a specific ABC. “<“, for instance, would presumably invoke “object.lessthan” (or perhaps “comparable.lessthan“). So another benefit would be the ability to wean Python away from this mangled-name oddness, which seems to me an HCI improvement.
Hm. I’m not sure I agree (figure that :-).

There are two bits of “Python rationale” that I’d like to explain first.

First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:

(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.

(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.

Saying the same thing in another way, I see ‘len’ as a built-in operation. I’d hate to lose that. I can’t say for sure whether you meant that or not, but ‘def len(self): …’ certainly sounds like you want to demote it to an ordinary method. I’m strongly -1 on that.

The second bit of Python rationale I promised to explain is the reason why I chose special methods to look __special__ and not merely special. I was anticipating lots of operations that classes might want to override, some standard (e.g. __add__ or __getitem__), some not so standard (e.g. pickle’s __reduce__ for a long time had no support in C code at all). I didn’t want these special operations to use ordinary method names, because then pre-existing classes, or classes written by users without an encyclopedic memory for all the special methods, would be liable to accidentally define operations they didn’t mean to implement, with possibly disastrous consequences. Ivan Krstić explained this more concise in his message, which arrived after I’d written all this up.

— –Guido van Rossum (home page: http://www.python.org/~guido/)

My understanding of this is that in certain cases, prefix notation just makes more sense (ie, Duck.quack makes more sense than quack(Duck) from a linguistic standpoint.) and again, the functions allow for “interfaces”.

In such a case, my guess would be to implement get_king_moves based solely on Guido’s first point. But that still leaves a lot of open questions regarding say, implementing a stack and queue class with similar push and pop methods- should they be functions or methods? (here I would guess functions, because I really want to signal a push-pop interface)

TLDR: Can someone explain what the strategy for deciding when to use functions vs. methods should be?

Question 11

My general rule is this – is the operation performed on the object or by the object?

if it is done by the object, it should be a member operation. If it could apply to other things too, or is done by something else to the object then it should be a function (or perhaps a member of something else).

When introducing programming, it is traditional (albeit implementation incorrect) to describe objects in terms of real-world objects such as cars. You mention a duck, so let’s go with that.

class duck: 
    def __init__(self):pass
    def eat(self, o): pass 
    def crap(self) : pass
    def die(self)
    ....

In the context of the “objects are real things” analogy, it is “correct” to add a class method for anything which the object can do. So say I want to kill off a duck, do I add a .kill() to the duck? No… as far as I know animals do not commit suicide. Therefore if I want to kill a duck I should do this:

def kill(o):
    if isinstance(o, duck):
        o.die()
    elif isinstance(o, dog):
        print "WHY????"
        o.die()
    elif isinstance(o, nyancat):
        raise Exception("NYAN "*9001)
    else:
       print "can't kill it."

Moving away from this analogy, why do we use methods and classes? Because we want to contain data and hopefully structure our code in a manner such that it will be reusable and extensible in the future. This brings us to the notion of encapsulation which is so dear to OO design.

The encapsulation principal is really what this comes down to: as a designer you should hide everything about the implementation and class internals which it is not absolutely necessarily for any user or other developer to access. Because we deal with instances of classes, this reduces to “what operations are crucial on this instance“. If an operation is not instance specific, then it should not be a member function.

TL;DR: what @Bryan said. If it operates on an instance and needs to access data which is internal to the class instance, it should be a member function.

Question 12

Use a class when you want to:

1) Isolate calling code from implementation details — taking advantage of abstraction and encapsulation.

2) When you want to be substitutable for other objects — taking advantage of polymorphism.

3) When you want to reuse code for similar objects — taking advantage of inheritance.

Use a function for calls that make sense across many different object types — for example, the builtin len and repr functions apply to many kinds of objects.

That being said, the choice sometimes comes down to a matter of taste. Think in terms of what is most convenient and readable for typical calls. For example, which would be better (x.sin()**2 + y.cos()**2).sqrt() or sqrt(sin(x)**2 + cos(y)**2)?

Question 13

Here’s a simple rule of thumb: if the code acts upon a single instance of an object, use a method. Even better: use a method unless there is a compelling reason to write it as a function.

In your specific example, you want it to look like this:

chessboard = Chessboard()
...
chessboard.get_king_moves()

Don’t over think it. Always use methods until the point comes where you say to yourself “it doesn’t make sense to make this a method”, in which case you can make a function.

Question 14

I usually think of an object like a person.

Attributes are the person’s name, height, shoe size, etc.

Methods and functions are operations that the person can perform.

If the operation could be done by just any ol’ person, without requiring anything unique to this one specific person (and without changing anything on this one specific person), then it’s a function and should be written as such.

If an operation is acting upon the person (e.g. eating, walking, …) or requires something unique to this person to get involved (like dancing, writing a book, …), then it should be a method.

Of course, it is not always trivial to translate this into the specific object you’re working with, but I find it is a good way to think of it.

Question 15

Generally I use classes to implement a logical set of capabilities for some thing, so that in the rest of my program I can reason about the thing, not having to worry about all the little concerns that make up its implementation.

Anything that’s part of that core abstraction of “what you can do with a thing” should usually be a method. This generally includes everything that can alter a thing, as the internal data state is usually considered private and not part of the logical idea of “what you can do with a thing“.

When you come to higher level operations, especially if they involve multiple things, I find they are usually most naturally expressed as functions, if they can be built out of the public abstraction of a thing without needing special access to the internals (unless they’re methods of some other object). This has the big advantage that when I decide to completely rewrite the internals of how my thing works (without changing the interface), I just have a small core set of methods to rewrite, and then all the external functions written in terms of those methods will Just Work. I find that insisting that all operations to do with class X are methods on class X leads to over-complicated classes.

It depends on the code I’m writing though. For some programs I model them as a collection of objects whose interactions give rise to the behavior of the program; here most important functionality is closely coupled to a single object, and so is implemented in methods, with a scattering of utility functions. For other programs the most important stuff is a set of functions that manipulate data, and classes are in use only to implement the natural “duck types” that are manipulated by the functions.

Question 16

You may say that, “in the face of ambiguity, refuse the temptation to guess”.

However, it’s not even a guess. You’re absolutely sure that the outcomes of both approaches are the same in that they solve your problem.

I believe it is only a good thing to have multiple ways to accomplishing goals. I’d humbly tell you, as other users did already, to employ whichever “tastes better” / feels more intuitive, in terms of language.

Question 17

In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?

Question 18

“In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?”

As mentioned by you follow PEP 8 for the main text, and PEP 257 for docstring conventions

Along with Python Style Guides, I suggest that you refer the following:

Question 19

I follow the Python Idioms and Efficiency guidelines, by Rob Knight. I think they are exactly the same as PEP 8, but are more synthetic and based on examples.

If you are using wxPython you might also want to check Style Guide for wxPython code, by Chris Barker, as well.

Question 20

I stick to PEP-8 very closely.

There are three specific things that I can’t be bothered to change to PEP-8.

Avoid extraneous whitespace immediately inside parentheses, brackets or braces.

Suggested: spam(ham[1], {eggs: 2})

I do this anyway: spam( ham[ 1 ], { eggs: 2 } )

Why? 30+ years of ingrained habit is snuggling ()’s up against function names or (in C) statements keywords. Starting with Fortran IV in the 70’s.
Use spaces around arithmetic operators:

Suggested: x = x * 2 - 1

I do this anyway: x= x * 2 - 1

Why? Gries’ The Science of Programming suggested this as a way to emphasize the connection between assignment and the variable who’s state is being changed.

It doesn’t work well for multiple assignment or augmented assignment, for that I use lots of spaces.
For function names, method names and instance variable names

Suggested: lowercase, with words separated by underscores as necessary to improve readability.

I do this anyway: camelCase

Why? 20+ years of ingrained habit of camelCase, starting with Pascal in the 80’s.

Question 21

PEP 8 is good, the only thing that i wish it came down harder on was the Tabs-vs-Spaces holy war.

Basically if you are starting a project in python, you need to choose Tabs or Spaces and then shoot all offenders on sight.

Question 22

To add to bhadra’s list of idiomatic guides:

Checkout Anthony Baxter’s presentation on Effective Python Programming (from OSON 2005).

An excerpt:

# dict's setdefault method turns this:
if key in dictobj:
    dictobj[key].append(val)
else:
    dictobj[key] = [val]
# into this:
dictobj.setdefault(key,[]).append(val)

Question 23

I follow it extremely rigorously. The only god before PEP-8 is existing code bases.

Question 24

Yes, I try to follow it as closely as possible.

I don’t follow any other coding standards.

Question 25

I follow the PEP8, it is a great piece of coding style.

Question 26

I know there are tools which validate whether your Python code is compliant with PEP8, for example there is both an online service and a python module.

However, I cannot find a service or module which can convert my Python file to a self-contained, PEP8 valid Python file. Does anyone know if there are any?
I assume it’s feasible since PEP8 is all about the appearance of the code, right?

Question 27

Unfortunately “pep8 storming” (the entire project) has several negative side-effects:

lots of merge-conflicts
break git blame
make code review difficult

As an alternative (and thanks to @y-p for the idea), I wrote a small package which autopep8s only those lines which you have been working on since the last commit/branch:

Basically leaving the project a little better than you found it:

pip install pep8radius

Suppose you’ve done your work off of master and are ready to commit:

# be somewhere in your project directory
# see the diff with pep, see the changes you've made since master
pep8radius master --diff
# make those changes
pep8radius master --diff --in-place

Or to clean the new lines you’ve commited since the last commit:

pep8radius --diff
pep8radius --diff --in-place

# the lines which changed since a specific commit `git diff 98f51f`
pep8radius 98f51f --diff

Basically pep8radius is applying autopep8 to lines in the output of git/hg diff (from the last shared commit).

This script currently works with git and hg, if your using something else and want this to work please post a comment/issue/PR!

Question 28

You can use autopep8! Whilst you make yourself a cup of coffee this tool happily removes all those pesky PEP8 violations which don’t change the meaning of the code.

Install it via pip:

pip install autopep8

Apply this to a specific file:

autopep8 py_file --in-place

or to your project (recursively), the verbose option gives you some feedback of how it’s going:

autopep8 project_dir --recursive --in-place --pep8-passes 2000 --verbose

Note: Sometimes the default of 100 passes isn’t enough, I set it to 2000 as it’s reasonably high and will catch all but the most troublesome files (it stops passing once it finds no resolvable pep8 infractions)…

At this point I suggest retesting and doing a commit!

If you want “full” PEP8 compliance: one tactic I’ve used is to run autopep8 as above, then run PEP8, which prints the remaining violations (file, line number, and what):

pep8 project_dir --ignore=E501

and manually change these individually (e.g. E712s – comparison with boolean).

Note: autopep8 offers an --aggressive argument (to ruthlessly “fix” these meaning-changing violations), but beware if you do use aggressive you may have to debug… (e.g. in numpy/pandas True == np.bool_(True) but not True is np.bool_(True)!)

You can check how many violations of each type (before and after):

pep8 --quiet --statistics .

Note: I consider E501s (line too long) are a special case as there will probably be a lot of these in your code and sometimes these are not corrected by autopep8.

As an example, I applied this technique to the pandas code base.

Question 29

@Andy Hayden gave good overview of autopep8. In addition to that there is one more package called pep8ify which also does the same thing.

However both packages can remove only lint errors but they cannot format code.

little = more[3:   5]

Above code remains same after pep8ifying also. But the code doesn’t look good yet. You can use formatters like yapf, which will format the code even if the code is PEP8 compliant. Above code will be formatted to

little = more[3:5]

Some times this even destroys Your manual formatting. For example

BAZ = {
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
}

will be converted to

BAZ = {[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]}

But You can tell it to ignore some parts.

BAZ = {
   [1, 2, 3, 4],
   [5, 6, 7, 8],
   [9, 10, 11, 12]
}  # yapf: disable

Taken from my old blog post: Automatically PEP8 & Format Your Python Code!

Question 30

If you’re using eclipse + PyDev you can simply activate autopep8 from PyDev’s settings: Windows -> Preferences -> type ‘autopep8’ in the search filter.

Check the ‘use autopep8.py for code formatting?’ -> OK

Now eclipse’s CTRL-SHIFT-F code formatting should format your code using autopep8 :)

Question 31

I made wide research about different instruments for python and code style. There are two types of instruments: linters – analyzing your code and give some warnings about bad used code styles and showing advices how to fix it, and code formatters – when you save your file it re-format your document using PEP style.

Because re-formatting must be more accurate – if it remorfat something that you don’t want it became useless – they cover less part of PEP, linters show much more.

All of them have different permissions for configuring – for example, pylinter configurable in all its rules (you can turn on/off every type of warnings), black unconfigurable at all.

Here are some useful links and tutorials:

Documentation:

PEP-257 Docstring Conventions: https://www.python.org/dev/peps/pep-0257/
PEP-484 Type Hint: https://www.python.org/dev/peps/pep-0484
Chromium Style Guide https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md
Code Style for autotest https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/docs/coding-style.md
Khan Academy Coding Style Guide https://github.com/Khan/style-guides/blob/master/style/python.md
The hitchhiker’s Guide to Python https://docs.python-guide.org/
EdX Python Style Guide https://edx.readthedocs.io/projects/edx-developer-guide/en/latest/style_guides/python-guidelines.html
Code Style Article on RealPython https://realpython.com/python-pep8/

Linters (in order of popularity):

mypy https://github.com/python/mypy linter for type checks (PEP-484)
pycodestyle https://github.com/PyCQA/pycodestyle – good one using PEP-8, very popular. Often used alongside of pylint and flake8 (simultaniously)
pylint https://github.com/PyCQA/pylint very configurable, actively supported
bandit https://github.com/PyCQA/bandit линтер по безопасности
prospector https://github.com/PyCQA/prospector pylint+code difficulty check
flake8 https://github.com/PyCQA/flake8 pycodestyle wrapper with ability to turn on plugins. Very big list of different configurable plugins. Here is awesome flake8 repo: https://github.com/DmytroLitvinov/awesome-flake8-extensions
wemake https://github.com/wemake-services/wemake-python-styleguide – trying to combine a lot of different linters in one project (really it is a flake8 plugin combining styles from several other linters)
pylama https://github.com/klen/pylama trying to combine 10 another linters in one (mypy, pylint, pycodeestyle, pydocstyle и др.). I can see the only one problem here – old version (no updates in github repo for about 10 months.)
pydocstyle https://github.com/PyCQA/pydocstyle docstrings linter (PEP-257)

Code formatters (in order of popularity):

black https://github.com/psf/black most populat formatter, used in several big companies. Was created later than yapf, but already has more starts at GitHub
yapf https://github.com/google/yapf Google code formatter
autopep8 https://github.com/hhatto/autopep8 build upon the pycodestyle

Question 32

There are many.

IDEs usually have some formatting capability built in. IntelliJ Idea / PyCharm does, same goes for the Python plugin for Eclipse, and so on.

There are formatters/linters that can target multiple languages. https://coala.io is a good example of those.

Then there are the single purpose tools, of which many are mentioned in other answers.

One specific method of automatic reformatting is to parse the file into AST tree (without dropping comments) and then dump it back to text (meaning nothing of the original formatting is preserved). Example of that would be https://github.com/python/black.

Question 33

I am trying to create a matrix of random numbers, but my solution is too long and looks ugly

random_matrix = [[random.random() for e in range(2)] for e in range(3)]

this looks ok, but in my implementation it is

weights_h = [[random.random() for e in range(len(inputs[0]))] for e in range(hiden_neurons)]

which is extremely unreadable and does not fit on one line.

Question 34

Take a look at numpy.random.rand:

Docstring: rand(d0, d1, …, dn)

Random values in a given shape.

Create an array of the given shape and propagate it with random samples from a uniform distribution over [0, 1).

>>> import numpy as np
>>> np.random.rand(2,3)
array([[ 0.22568268,  0.0053246 ,  0.41282024],
       [ 0.68824936,  0.68086462,  0.6854153 ]])

Question 35

You can drop the range(len()):

weights_h = [[random.random() for e in inputs[0]] for e in range(hiden_neurons)]

But really, you should probably use numpy.

In [9]: numpy.random.random((3, 3))
Out[9]:
array([[ 0.37052381,  0.03463207,  0.10669077],
       [ 0.05862909,  0.8515325 ,  0.79809676],
       [ 0.43203632,  0.54633635,  0.09076408]])

Question 36

use np.random.randint() as numpy.random.random_integers() is deprecated

random_matrix = numpy.random.randint(min_val,max_val,(<num_rows>,<num_cols>))

Question 37

Looks like you are doing a Python implementation of the Coursera Machine Learning Neural Network exercise. Here’s what I did for randInitializeWeights(L_in, L_out)

#get a random array of floats between 0 and 1 as Pavel mentioned 
W = numpy.random.random((L_out, L_in +1))

#normalize so that it spans a range of twice epsilon
W = W * 2 * epsilon

#shift so that mean is at zero
W = W - epsilon

Question 38

First, create numpy array then convert it into matrix. See the code below:

import numpy

B = numpy.random.random((3, 4)) #its ndArray
C = numpy.matrix(B)# it is matrix
print(type(B))
print(type(C)) 
print(C)

Question 39

x = np.int_(np.random.rand(10) * 10)

For random numbers out of 10. For out of 20 we have to multiply by 20.

Question 40

When you say “a matrix of random numbers”, you can use numpy as Pavel https://stackoverflow.com/a/15451997/6169225 mentioned above, in this case I’m assuming to you it is irrelevant what distribution these (pseudo) random numbers adhere to.

However, if you require a particular distribution (I imagine you are interested in the uniform distribution), numpy.random has very useful methods for you. For example, let’s say you want a 3×2 matrix with a pseudo random uniform distribution bounded by [low,high]. You can do this like so:

numpy.random.uniform(low,high,(3,2))

Note, you can replace uniform by any number of distributions supported by this library.

Further reading: https://docs.scipy.org/doc/numpy/reference/routines.random.html

Question 41

A simple way of creating an array of random integers is:

matrix = np.random.randint(maxVal, size=(rows, columns))

The following outputs a 2 by 3 matrix of random integers from 0 to 10:

a = np.random.randint(10, size=(2,3))

Question 42

For creating an array of random numbers NumPy provides array creation using:

Real numbers
Integers

For creating array using random Real numbers: there are 2 options

random.rand (for uniform distribution of the generated random numbers )
random.randn (for normal distribution of the generated random numbers )

random.rand

import numpy as np 
arr = np.random.rand(row_size, column_size)

random.randn

import numpy as np 
arr = np.random.randn(row_size, column_size)

For creating array using random Integers:

import numpy as np
numpy.random.randint(low, high=None, size=None, dtype='l')

where

low = Lowest (signed) integer to be drawn from the distribution
high(optional)= If provided, one above the largest (signed) integer to be drawn from the distribution
size(optional) = Output shape i.e. if the given shape is, e.g., (m, n, k), then m * n * k samples are drawn
dtype(optional) = Desired dtype of the result.

eg:

The given example will produce an array of random integers between 0 and 4, its size will be 5*5 and have 25 integers

arr2 = np.random.randint(0,5,size = (5,5))

in order to create 5 by 5 matrix, it should be modified to

arr2 = np.random.randint(0,5,size = (5,5)), change the multiplication symbol* to a comma ,#

[[2 1 1 0 1][3 2 1 4 3][2 3 0 3 3][1 3 1 0 0][4 1 2 0 1]]

eg2:

The given example will produce an array of random integers between 0 and 1, its size will be 1*10 and will have 10 integers

arr3= np.random.randint(2, size = 10)

[0 0 0 0 1 1 0 0 1 1]

Question 43

random_matrix = [[random.random for j in range(collumns)] for i in range(rows)
for i in range(rows):
    print random_matrix[i]

Question 44

An answer using map-reduce:-

map(lambda x: map(lambda y: ran(),range(len(inputs[0]))),range(hiden_neurons))

Question 45

#this is a function for a square matrix so on the while loop rows does not have to be less than cols.
#you can make your own condition. But if you want your a square matrix, use this code.

import random

import numpy as np

def random_matrix(R, cols):

        matrix = []

        rows =  0

        while  rows < cols:

            N = random.sample(R, cols)

            matrix.append(N)

            rows = rows + 1

    return np.array(matrix)

print(random_matrix(range(10), 5))
#make sure you understand the function random.sample

Question 46

numpy.random.rand(row, column) generates random numbers between 0 and 1, according to the specified (m,n) parameters given. So use it to create a (m,n) matrix and multiply the matrix for the range limit and sum it with the high limit.

Analyzing: If zero is generated just the low limit will be held, but if one is generated just the high limit will be held. In order words, generating the limits using rand numpy you can generate the extreme desired numbers.

import numpy as np

high = 10
low = 5
m,n = 2,2

a = (high - low)*np.random.rand(m,n) + low

Output:

a = array([[5.91580065, 8.1117106 ],
          [6.30986984, 5.720437  ]])

Question 47

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.
I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.
I’ll make use of Python’s parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.
From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don’t need a perfect translation. I’ll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask “What the hell is the point of this?” The answer is… It’ll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.

EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

Question 48

I’ve been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There’s nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you’ll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each “reliable” language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a “hell of a learning experience”; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won’t discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman’s compiler book doesn’t stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about “I don’t need a perfect translation” is troublesome. What weak translators do is convert the “easy” 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don’t understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that’s even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can’t complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don’t justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that’s painful.

I consider our tools to be extremely good (but then, I’m pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.

Question 49

My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.

In short: do not use the parser module, there’s an easier way.

The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I’ve written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.

Now, since Python’s ASTs are quite detailed, given an AST the front-end job isn’t terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you’ll need heavy machinery to emulate one language’s core in another. For example consider Python’s generators and list comprehensions which don’t exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).

To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.

Question 50

Writing a translator isn’t impossible, especially considering that Joel’s Intern did it over a summer.

If you want to do one language, it’s easy. If you want to do more, it’s a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.

For instance:

word = 'This is not a word'
print word[::-2]

takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).

That’s a bit of an aside, I guess.

Have you ever written a tokenizer/parser based on a language grammar? You’ll probably want to learn how to do that if you haven’t, because that’s the main part of this project. What I would do is come up with a basic Turing complete syntax – something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you’ll want to do is do the reverse – create a parser from your language into target languages based on the grammar.

The most obvious problem I see is that at first you’ll probably create horribly inefficient code, especially in more powerful* languages like Python.

But if you do it this way then you’ll probably be able to figure out ways to optimize the output as you go along. To summarize:

read provided grammar
compile program into intermediate (but also Turing complete) syntax
compile intermediate program into final language (based on provided grammar)
…?
Profit!(?)

*by powerful I mean that this takes 4 lines:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

Show me another language that can do something like that in 4 lines, and I’ll show you a language that’s as powerful as Python.

Question 51

There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn’t machine code, but another high-level language. This is done all the time.

There’s a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.

You’re lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.

The “learner” approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That’s basically what the “big” guys in the field all did.

Oh, and since you don’t have static types in Python, it might be best to write and rely on PHP functions like “python_add” which adds numbers, strings, or objects according to the way Python does it.

Obviously, this can get much bigger if you let it.

Question 52

I will second @EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (@https://bitbucket.org/amirouche/pythonium). I’ve come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It’s actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.

Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn’t (don’t hesitate to tell me if something doesn’t seem correct to the netiquette):

Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it’s impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I’m not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.
I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.
Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.

I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only “plumbing”. After that I discovered that pyjamas was actually a real python->js translator.

Still I think it’s possible to do API->API (or framework->framework) translation and that’s basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium…

Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff… but written in JavaScript.

Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it’s meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.

Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it’s actually more easy to compile Python to JavaScript than interpret it.

I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn’t even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:

function with full parameters semantic both in definition and calling. This is the part I am most proud of.
while/if/elif/else
Python types were converted to JavaScript types (there is no python types of any kind)
for could iterate over Javascript arrays only (for a in array)
Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, “egg”) is translated to “new Python(1, 2, spam, “egg”).
“var” are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
global keyword
closures
lambdas
list comprehensions
imports are supported via requirejs
single class inheritance + mixin via classyjs

This seems like a lot but actually very narrow compared to full blown semantic of Python. It’s really JavaScript with a Python syntax.

The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it’s very readable.

The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found @ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.

An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master

After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:

spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I’m not sure I can improve upon it with asm.js or anything else.
method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn’t support data descriptors
metaclass class based: I know where to plug the code, but still…
last bu not least: some_callable(…) is always transalted to “call(some_callable)”. AFAIK the translator doesn’t use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it’s meant to be called.

This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It’s written in Python compatible with Python Veloce.

The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn’t generate JavaScript code directly and most importantly doesn’t do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don’t need to do ast->ast.

Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:

var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the “var my_generator = function”

So I don’t really visit each node once for each phase of the translation.

The overall process can be described as:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master

Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.

The valuable advice I can give you in the light of this experience are kind old farts:

extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
failure is experience
a small step is a step
start small
dream big
do demos
iterate

With Python Veloce mode only, I’m very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).

About what Ira Baxter response:

The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all…

When someone says something is hard or more often impossible, I answer that “it only takes time to find a solution for a problem that is impossible” otherwise said nothing is impossible except if it’s proven impossible in this case a math proof…

If it’s not proven impossible then it leaves room for imagination:

finding a proof proving it’s impossible

and

If it is impossible there may be an “inferior” problem that can have a solution.

or

if it’s not impossible, finding a solution

It’s not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc… I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).

Most people that say that a thing is “hard” or “impossible” don’t provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it’s impossible alone is not helpful, It’s even worse than “not helpful” it’s discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.

What would be perfection for you? That’s how you define next goal and maybe reach the overall goal.

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you’d better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so…

Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python… If I was to implement a Prolog to Python translator I wouldn’t implement unification in Python but in a C library and come up with a “Python syntax” that is very readable for a Pythonist. In the end, syntax is only “painting” for which we give a meaning (that’s why I started scheme). Evil is in the detail of the language and I’m not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don’t care if the initial program doesn’t use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.

For target and source languages, look for:

Big and specific ideas
Tiny and common shared ideas

From this will emerge:

Things that are easy to translate
Things that are difficult to translate

You will also probably be able to know what will be translated to fast and slow code.

There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.

Idiomatic code or readable generated code have also solutions…

Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.

Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode… Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying…

You can also have a look at those libraries:

Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/

This Google Tech Talk from Ira Baxter is interesting https://www.youtube.com/watch?v=C-_dw9iEzhA

Question 53

You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.

Question 54

In Python you may have a function definition:

def info(object, spacing=10, collapse=1)

which could be called in any of the following ways:

info(odbchelper)                    
info(odbchelper, 12)                
info(odbchelper, collapse=0)        
info(spacing=15, object=odbchelper)

thanks to Python’s allowing of any-order arguments, so long as they’re named.

The problem we’re having is as some of our larger functions grow, people might be adding parameters between spacing and collapse, meaning that the wrong values may be going to parameters that aren’t named. In addition sometimes it’s not always clear as to what needs to go in. We’re after a way to force people to name certain parameters – not just a coding standard, but ideally a flag or pydev plugin?

so that in the above 4 examples, only the last would pass the check as all the parameters are named.

Odds are we’ll only turn it on for certain functions, but any suggestions as to how to implement this – or if it’s even possible would be appreciated.

Question 55

In Python 3 – Yes, you can specify * in the argument list.

From docs:

Parameters after “*” or “*identifier” are keyword-only parameters and may only be passed used keyword arguments.

>>> def foo(pos, *, forcenamed):
...   print(pos, forcenamed)
... 
>>> foo(pos=10, forcenamed=20)
10 20
>>> foo(10, forcenamed=20)
10 20
>>> foo(10, 20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes exactly 1 positional argument (2 given)

This can also be combined with **kwargs:

def foo(pos, *, forcenamed, **kwargs):

Question 56

You can force people to use keyword arguments in Python3 by defining a function in the following way.

def foo(*, arg0="default0", arg1="default1", arg2="default2"):
    pass

By making the first argument a positional argument with no name you force everyone who calls the function to use the keyword arguments which is what I think you were asking about. In Python2 the only way to do this is to define a function like this

def foo(**kwargs):
    pass

That’ll force the caller to use kwargs but this isn’t that great of a solution as you’d then have to put a check to only accept the argument that you need.

Question 57

True, most programming languages make parameter order part of the function call contract, but this doesn’t need to be so. Why would it? My understanding of the question is, then, if Python is any different to other programming languages in this respect. In addition to other good answers for Python 2, please consider the following:

__named_only_start = object()

def info(param1,param2,param3,_p=__named_only_start,spacing=10,collapse=1):
    if _p is not __named_only_start:
        raise TypeError("info() takes at most 3 positional arguments")
    return str(param1+param2+param3) +"-"+ str(spacing) +"-"+ str(collapse)

The only way a caller would be able to provide arguments spacing and collapse positionally (without an exception) would be:

info(arg1, arg2, arg3, module.__named_only_start, 11, 2)

The convention of not using private elements belonging to other modules already is very basic in Python. As with Python itself, this convention for parameters would only be semi-enforced.

Otherwise, calls would need to be of the form:

info(arg1, arg2, arg3, spacing=11, collapse=2)

A call

info(arg1, arg2, arg3, 11, 2)

would assign value 11 to parameter _p and an exception risen by the function’s first instruction.

Characteristics:

Parameters before _p=__named_only_start are admitted positionally (or by name).
Parameters after _p=__named_only_start must be provided by name only (unless knowledge about the special sentinel object __named_only_start is obtained and used).

Pros:

Parameters are explicit in number and meaning (the later if good names are also chosen, of course).
If the sentinel is specified as first parameter, then all arguments need to be specified by name.
When calling the function, it’s possible to switch to positional mode by using the sentinel object __named_only_start in the corresponding position.
A better performance than other alternatives can be anticipated.

Cons:

~~Checking occurs during run-time, not compile-time.~~
Use of an extra parameter (though not argument) and an additional check. Small performance degradation respect to regular functions.
Functionality is a hack without direct support by the language (see note below).
When calling the function, it’s possible to switch to positional mode by using the sentinel object __named_only_start in the right position. Yes, this can also be seen as a pro.

Please do keep in mind that this answer is only valid for Python 2. Python 3 implements the similar, but very elegant, language-supported mechanism described in other answers.

I’ve found that when I open my mind and think about it, no question or other’s decision seems stupid, dumb, or just silly. Quite on the contrary: I typically learn a lot.

Question 58

You can do that in a way that works in both Python 2 and Python 3, by making a “bogus” first keyword argument with a default value that will not occur “naturally”. That keyword argument can be preceded by one or more arguments without value:

_dummy = object()

def info(object, _kw=_dummy, spacing=10, collapse=1):
    if _kw is not _dummy:
        raise TypeError("info() takes 1 positional argument but at least 2 were given")

This will allow:

info(odbchelper)        
info(odbchelper, collapse=0)        
info(spacing=15, object=odbchelper)

but not:

info(odbchelper, 12)

If you change the function to:

def info(_kw=_dummy, spacing=10, collapse=1):

then all arguments must have keywords and info(odbchelper) will no longer work.

This will allow you to position additional keyword arguments any place after _kw, without forcing you to put them after the last entry. This often makes sense, e.g. grouping thing logically or arranging keywords alphabetically can help with maintenance and development.

So there is no need to revert to using def(**kwargs) and losing the signature information in your smart editor. Your social contract is to provide certain information, by forcing (some of them) to require keywords, the order these are presented in, has become irrelevant.

Question 59

Update:

I realized that using **kwargs would not solve the problem. If your programmers change function arguments as they wish, one could, for example, change the function to this:

def info(foo, **kwargs):

and the old code would break again (because now every function call has to include the first argument).

It really comes down to what Bryan says.

(…) people might be adding parameters between spacing and collapse (…)

In general, when changing functions, new arguments should always go to the end. Otherwise it breaks the code. Should be obvious.
If someone changes the function so that the code breaks, this change has to be rejected.
(As Bryan says, it is like a contract)

(…) sometimes it’s not always clear as to what needs to go in.

By looking at the signature of the function (i.e def info(object, spacing=10, collapse=1) ) one should immediately see that every argument that has not a default value, is mandatory.
What the argument is for, should go into the docstring.

Old answer (kept for completeness):

~~This is probably not a good solution:~~

You can define functions this way:

def info(**kwargs):
    ''' Some docstring here describing possible and mandatory arguments. '''
    spacing = kwargs.get('spacing', 15)
    obj = kwargs.get('object', None)
    if not obj:
       raise ValueError('object is needed')

kwargs is a dictionary that contains any keyword argument. You can check whether a mandatory argument is present and if not, raise an exception.

The downside is, that it might not be that obvious anymore, which arguments are possible, but with a proper docstring, it should be fine.

Question 60

The python3 keyword-only arguments (*) can be simulated in python2.x with **kwargs

Consider the following python3 code:

def f(pos_arg, *, no_default, has_default='default'):
    print(pos_arg, no_default, has_default)

and its behaviour:

>>> f(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() takes 1 positional argument but 3 were given
>>> f(1, no_default='hi')
1 hi default
>>> f(1, no_default='hi', has_default='hello')
1 hi hello
>>> f(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() missing 1 required keyword-only argument: 'no_default'
>>> f(1, no_default=1, wat='wat')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() got an unexpected keyword argument 'wat'

This can be simulated using the following, note I’ve taken the liberty of switching TypeError to KeyError in the “required named argument” case, it wouldn’t be too much work to make that the same exception type as well

def f(pos_arg, **kwargs):
    no_default = kwargs.pop('no_default')
    has_default = kwargs.pop('has_default', 'default')
    if kwargs:
        raise TypeError('unexpected keyword argument(s) {}'.format(', '.join(sorted(kwargs))))

    print(pos_arg, no_default, has_default)

And behaviour:

>>> f(1, 2, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() takes exactly 1 argument (3 given)
>>> f(1, no_default='hi')
(1, 'hi', 'default')
>>> f(1, no_default='hi', has_default='hello')
(1, 'hi', 'hello')
>>> f(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
KeyError: 'no_default'
>>> f(1, no_default=1, wat='wat')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in f
TypeError: unexpected keyword argument(s) wat

The recipe works equally as well in python3.x, but should be avoided if you are python3.x only

Question 61

You could declare your functions as receiving **args only. That would mandate keyword arguments but you’d have some extra work to make sure only valid names are passed in.

def foo(**args):
   print args

foo(1,2) # Raises TypeError: foo() takes exactly 0 arguments (2 given)
foo(hello = 1, goodbye = 2) # Works fine.

Question 62

As other answers say, changing function signatures is a bad idea. Either add new parameters to the end, or fix every caller if arguments are inserted.

If you still want to do it, use a function decorator and the inspect.getargspec function. It would be used something like this:

@require_named_args
def info(object, spacing=10, collapse=1):
    ....

Implementation of require_named_args is left as an exercise for the reader.

I would not bother. It will be slow every time the function is called, and you will get better results from writing code more carefully.

Question 63

You could use the ** operator:

def info(**kwargs):

this way people are forced to use named parameters.

Question 64

def cheeseshop(kind, *arguments, **keywords):

in python if use *args that means you can pass n-number of positional arguments for this parameter – which will be accessed as a tuple inside the function.

And if use **kw that means its keyword arguments, that can be access as dict – you can pass n-number of kw args, and if you want to restrict that user must enter the sequence and arguments in order then don’t use * and ** – (its pythonic way to provide generic solutions for big architectures…)

if you want to restrict your function with default values then you can check inside it

def info(object, spacing, collapse)
  spacing = 10 if spacing is None else spacing
  collapse = 1 if collapse is None else collapse

问题：如何在Python中打破一系列链接方法？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：在Python中，什么时候应该使用函数而不是方法？

为什么Python使用方法来实现某些功能（例如list.index（）），却使用其他方法（例如len（list））呢？

Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))?

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Python编码标准/最佳实践

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：将Python代码转换为符合PEP8的工具

回答 0

基本上使项目 比您发现的要好：

Basically leaving the project a little better than you found it:

回答 1

此时，我建议重新测试并进行提交！

您可以检查每种类型（前后）有多少次违规：

At this point I suggest retesting and doing a commit!

You can check how many violations of each type (before and after):

回答 2

回答 3

回答 4

回答 5

问题：创建随机数矩阵的简单方法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

为了创建5 x 5矩阵，应将其修改为

in order to create 5 by 5 matrix, it should be modified to

回答 9

回答 10

回答 11

回答 12

问题：我可以对代码执行哪种模式以使其更容易转换为另一种编程语言？[关闭]

编辑：

EDIT:

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：在Python中强制命名参数

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：PEP 8，为什么关键字参数或默认参数值的’=’周围没有空格？

回答 0

回答 1

基本上使项目比您发现的要好：