Python 实用宝典

Question 1

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem…

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: 123412345123456 then the function/program would return:

123 - 3 times
234 - 3 times
345 - 2 times

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 –> 999

Now that I’m thinking about it, I don’t think it’s possible to come up with a constant time algorithm. Is it?

Question 2

You got off lightly, you probably don’t want to be working for a hedge fund where the quants don’t understand basic algorithms :-)

There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.

Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that’s not usually how people use complexity analysis.

It appears to me you could have impressed them in a number of ways.

First, by informing them that it’s not possible to do it in O(1), unless you use the “suspect” reasoning given above.

Second, by showing your elite skills by providing Pythonic code such as:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

This outputs:

[(123, 3), (234, 3), (345, 2)]

though you could, of course, modify the output format to anything you desire.

And, finally, by telling them there’s almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.

And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.

Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):

You can farm these out to separate workers and combine the results afterwards.

The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of “measure, don’t guess” applies here, of course.

This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.

For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

Question 3

Constant time isn’t possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.

For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.

Question 4

A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.

Parallel execution version

I wrote a blog post on this with more explanation.

Question 5

The simple O(n) solution would be to count each 3-digit number:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

This would search through all 1 million digits 1000 times.

Traversing the digits only once:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

Timing shows that iterating only once over the index is twice as fast as using count.

Question 6

Here is a NumPy implementation of the “consensus” O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say “385”, adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

Unsurprisingly, NumPy is a bit faster than @Daniel’s pure Python solution on large data sets. Sample output:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

Question 7

I would solve the problem as follows:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

Applied to your example string, this yields:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.

Question 8

As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn’t exists already in the dictionary.

The code will look something like this:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

You can filter down to the keys which have item value greater than 1.

Question 9

As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.

However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.

My guess is that either the interviewer misspoke when they gave you the solution, or you misheard “constant time” when they said “constant space.”

Question 10

Here’s my answer:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

The array lookup method is very fast (even faster than @paul-panzer’s numpy method!). Of course, it cheats since it isn’t technicailly finished after it completes, because it’s returning a generator. It also doesn’t have to check every iteration if the value already exists, which is likely to help a lot.

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

Question 11

Image as answer:

IMAGE AS ANSWER

Looks like a sliding window.

Question 12

Here is my solution:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point. Hope it helps :)

Question 13

-Telling from the perspective of C. -You can have an int 3-d array results[10][10][10]; -Go from 0th location to n-4th location, where n being the size of the string array. -On each location, check the current, next and next’s next. -Increment the cntr as resutls[current][next][next’s next]++; -Print the values of

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-It is O(n) time, there is no comparisons involved. -You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.

Question 14

inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count

Question 15

cgi.escape seems like one possible choice. Does it work well? Is there something that is considered better?

Question 16

cgi.escape is fine. It escapes:

< to <
> to >
& to &

That is enough for all HTML.

EDIT: If you have non-ascii chars you also want to escape, for inclusion in another encoded document that uses a different encoding, like Craig says, just use:

data.encode('ascii', 'xmlcharrefreplace')

Don’t forget to decode data to unicode first, using whatever encoding it was encoded.

However in my experience that kind of encoding is useless if you just work with unicode all the time from start. Just encode at the end to the encoding specified in the document header (utf-8 for maximum compatibility).

Example:

>>> cgi.escape(u'<a>bá</a>').encode('ascii', 'xmlcharrefreplace')
'&lt;a&gt;b&#225;&lt;/a&gt;

Also worth of note (thanks Greg) is the extra quote parameter cgi.escape takes. With it set to True, cgi.escape also escapes double quote chars (") so you can use the resulting value in a XML/HTML attribute.

EDIT: Note that cgi.escape has been deprecated in Python 3.2 in favor of html.escape, which does the same except that quote defaults to True.

Question 17

In Python 3.2 a new html module was introduced, which is used for escaping reserved characters from HTML markup.

It has one function escape():

>>> import html
>>> html.escape('x > 2 && x < 7 single quote: \' double quote: "')
'x &gt; 2 &amp;&amp; x &lt; 7 single quote: &#x27; double quote: &quot;'

Question 18

If you wish to escape HTML in a URL:

This is probably NOT what the OP wanted (the question doesn’t clearly indicate in which context the escaping is meant to be used), but Python’s native library urllib has a method to escape HTML entities that need to be included in a URL safely.

The following is an example:

#!/usr/bin/python
from urllib import quote

x = '+<>^&'
print quote(x) # prints '%2B%3C%3E%5E%26'

Find docs here

Question 19

There is also the excellent markupsafe package.

>>> from markupsafe import Markup, escape
>>> escape("<script>alert(document.cookie);</script>")
Markup(u'&lt;script&gt;alert(document.cookie);&lt;/script&gt;')

The markupsafe package is well engineered, and probably the most versatile and Pythonic way to go about escaping, IMHO, because:

the return (Markup) is a class derived from unicode (i.e. isinstance(escape('str'), unicode) == True
it properly handles unicode input
it works in Python (2.6, 2.7, 3.3, and pypy)
it respects custom methods of objects (i.e. objects with a __html__ property) and template overloads (__html_format__).

Question 20

cgi.escape should be good to escape HTML in the limited sense of escaping the HTML tags and character entities.

But you might have to also consider encoding issues: if the HTML you want to quote has non-ASCII characters in a particular encoding, then you would also have to take care that you represent those sensibly when quoting. Perhaps you could convert them to entities. Otherwise you should ensure that the correct encoding translations are done between the “source” HTML and the page it’s embedded in, to avoid corrupting the non-ASCII characters.

Question 21

No libraries, pure python, safely escapes text into html text:

text.replace('&', '&amp;').replace('>', '&gt;').replace('<', '&lt;'
        ).encode('ascii', 'xmlcharrefreplace')

Question 22

`cgi.escape` extended

This version improves cgi.escape. It also preserves whitespace and newlines. Returns a unicode string.

def escape_html(text):
    """escape strings for display in HTML"""
    return cgi.escape(text, quote=True).\
           replace(u'\n', u'<br />').\
           replace(u'\t', u'&emsp;').\
           replace(u'  ', u' &nbsp;')

for example

>>> escape_html('<foo>\nfoo\t"bar"')
u'&lt;foo&gt;<br />foo&emsp;&quot;bar&quot;'

Question 23

Not the easiest way, but still straightforward. The main difference from cgi.escape module – it still will work properly if you already have & in your text. As you see from comments to it:

cgi.escape version

def escape(s, quote=None):
    '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
is also translated.'''
    s = s.replace("&", "&amp;") # Must be done first!
    s = s.replace("<", "&lt;")
    s = s.replace(">", "&gt;")
    if quote:
        s = s.replace('"', "&quot;")
    return s

regex version

QUOTE_PATTERN = r"""([&<>"'])(?!(amp|lt|gt|quot|#39);)"""
def escape(word):
    """
    Replaces special characters <>&"' to HTML-safe sequences. 
    With attention to already escaped characters.
    """
    replace_with = {
        '<': '&gt;',
        '>': '&lt;',
        '&': '&amp;',
        '"': '&quot;', # should be escaped in attributes
        "'": '&#39'    # should be escaped in attributes
    }
    quote_pattern = re.compile(QUOTE_PATTERN)
    return re.sub(quote_pattern, lambda x: replace_with[x.group(0)], word)

Question 24

For legacy code in Python 2.7, can do it via BeautifulSoup4:

>>> bs4.dammit import EntitySubstitution
>>> esub = EntitySubstitution()
>>> esub.substitute_html("r&d")
'r&amp;d'

Question 25

I’ve been using python for years, but I have little experience with python web programming. I’d like to create a very simple web service that exposes some functionality from an existing python script for use within my company. It will likely return the results in csv. What’s the quickest way to get something up? If it affects your suggestion, I will likely be adding more functionality to this, down the road.

Question 26

Have a look at werkzeug. Werkzeug started as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility modules. It includes a powerful debugger, full featured request and response objects, HTTP utilities to handle entity tags, cache control headers, HTTP dates, cookie handling, file uploads, a powerful URL routing system and a bunch of community contributed addon modules.

It includes lots of cool tools to work with http and has the advantage that you can use it with wsgi in different environments (cgi, fcgi, apache/mod_wsgi or with a plain simple python server for debugging).

Question 27

web.py is probably the simplest web framework out there. “Bare” CGI is simpler, but you’re completely on your own when it comes to making a service that actually does something.

“Hello, World!” according to web.py isn’t much longer than an bare CGI version, but it adds URL mapping, HTTP command distinction, and query parameter parsing for free:

import web

urls = (
    '/(.*)', 'hello'
)
app = web.application(urls, globals())

class hello:        
    def GET(self, name):
        if not name: 
            name = 'world'
        return 'Hello, ' + name + '!'

if __name__ == "__main__":
    app.run()

Question 28

The simplest way to get a Python script online is to use CGI:

#!/usr/bin/python

print "Content-type: text/html"
print

print "<p>Hello world.</p>"

Put that code in a script that lives in your web server CGI directory, make it executable, and run it. The cgi module has a number of useful utilities when you need to accept parameters from the user.

Question 29

Raw CGI is kind of a pain, Django is kind of heavyweight. There are a number of simpler, lighter frameworks about, e.g. CherryPy. It’s worth looking around a bit.

Question 30

Look at the WSGI reference implementation. You already have it in your Python libraries. It’s quite simple.

Question 31

If you mean with “Web Service” something accessed by other Programms SimpleXMLRPCServer might be right for you. It is included with every Python install since Version 2.2.

For Simple human accessible things I usually use Pythons SimpleHTTPServer which also comes with every install. Obviously you also could access SimpleHTTPServer by client programs.

Question 32

Life is simple if you get a good web framework. Web services in Django are easy. Define your model, write view functions that return your CSV documents. Skip the templates.

Question 33

If you mean “web service” in SOAP/WSDL sense, you might want to look at Generating a WSDL using Python and SOAPpy

Question 34

maybe Twisted http://twistedmatrix.com/trac/

Question 35

Given the following format (.properties or .ini):

propertyName1=propertyValue1
propertyName2=propertyValue2
...
propertyNameN=propertyValueN

For Java there is the Properties class that offers functionality to parse / interact with the above format.

Is there something similar in python‘s standard library (2.x) ?

If not, what other alternatives do I have ?

Question 36

For .ini files there is the ConfigParser module that provides a format compatible with .ini files.

Anyway there’s nothing available for parsing complete .properties files, when I have to do that I simply use jython (I’m talking about scripting).

Question 37

I was able to get this to work with ConfigParser, no one showed any examples on how to do this, so here is a simple python reader of a property file and example of the property file. Note that the extension is still .properties, but I had to add a section header similar to what you see in .ini files… a bit of a bastardization, but it works.

The python file: PythonPropertyReader.py

#!/usr/bin/python    
import ConfigParser
config = ConfigParser.RawConfigParser()
config.read('ConfigFile.properties')

print config.get('DatabaseSection', 'database.dbname');

The property file: ConfigFile.properties

[DatabaseSection]
database.dbname=unitTest
database.user=root
database.password=

For more functionality, read: https://docs.python.org/2/library/configparser.html

Question 38

A java properties file is often valid python code as well. You could rename your myconfig.properties file to myconfig.py. Then just import your file, like this

import myconfig

and access the properties directly

print myconfig.propertyName1

Question 39

I know that this is a very old question, but I need it just now and I decided to implement my own solution, a pure python solution, that covers most uses cases (not all):

def load_properties(filepath, sep='=', comment_char='#'):
    """
    Read the file passed as parameter as a properties file.
    """
    props = {}
    with open(filepath, "rt") as f:
        for line in f:
            l = line.strip()
            if l and not l.startswith(comment_char):
                key_value = l.split(sep)
                key = key_value[0].strip()
                value = sep.join(key_value[1:]).strip().strip('"') 
                props[key] = value 
    return props

You can change the sep to ‘:’ to parse files with format:

key : value

The code parses correctly lines like:

url = "http://my-host.com"
name = Paul = Pablo
# This comment line will be ignored

You’ll get a dict with:

{"url": "http://my-host.com", "name": "Paul = Pablo" }

Question 40

If you have an option of file formats I suggest using .ini and Python’s ConfigParser as mentioned. If you need compatibility with Java .properties files I have written a library for it called jprops. We were using pyjavaproperties, but after encountering various limitations I ended up implementing my own. It has full support for the .properties format, including unicode support and better support for escape sequences. Jprops can also parse any file-like object while pyjavaproperties only works with real files on disk.

Question 41

if you don’t have multi line properties and a very simple need, a few lines of code can solve it for you:

File t.properties:

a=b
c=d
e=f

Python code:

with open("t.properties") as f:
    l = [line.split("=") for line in f.readlines()]
    d = {key.strip(): value.strip() for key, value in l}

Question 42

This is not exactly properties but Python does have a nice library for parsing configuration files. Also see this recipe: A python replacement for java.util.Properties.

Question 43

Here is link to my project: https://sourceforge.net/projects/pyproperties/. It is a library with methods for working with *.properties files for Python 3.x.

But it is not based on java.util.Properties

Question 44

This is a one-to-one replacement of java.util.Propeties

From the doc:

  def __parse(self, lines):
        """ Parse a list of lines and create
        an internal property dictionary """

        # Every line in the file must consist of either a comment
        # or a key-value pair. A key-value pair is a line consisting
        # of a key which is a combination of non-white space characters
        # The separator character between key-value pairs is a '=',
        # ':' or a whitespace character not including the newline.
        # If the '=' or ':' characters are found, in the line, even
        # keys containing whitespace chars are allowed.

        # A line with only a key according to the rules above is also
        # fine. In such case, the value is considered as the empty string.
        # In order to include characters '=' or ':' in a key or value,
        # they have to be properly escaped using the backslash character.

        # Some examples of valid key-value pairs:
        #
        # key     value
        # key=value
        # key:value
        # key     value1,value2,value3
        # key     value1,value2,value3 \
        #         value4, value5
        # key
        # This key= this value
        # key = value1 value2 value3

        # Any line that starts with a '#' is considerered a comment
        # and skipped. Also any trailing or preceding whitespaces
        # are removed from the key/value.

        # This is a line parser. It parses the
        # contents like by line.

Question 45

You can use a file-like object in ConfigParser.RawConfigParser.readfp defined here -> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.readfp

Define a class that overrides readline that adds a section name before the actual contents of your properties file.

I’ve packaged it into the class that returns a dict of all the properties defined.

import ConfigParser

class PropertiesReader(object):

    def __init__(self, properties_file_name):
        self.name = properties_file_name
        self.main_section = 'main'

        # Add dummy section on top
        self.lines = [ '[%s]\n' % self.main_section ]

        with open(properties_file_name) as f:
            self.lines.extend(f.readlines())

        # This makes sure that iterator in readfp stops
        self.lines.append('')

    def readline(self):
        return self.lines.pop(0)

    def read_properties(self):
        config = ConfigParser.RawConfigParser()

        # Without next line the property names will be lowercased
        config.optionxform = str

        config.readfp(self)
        return dict(config.items(self.main_section))

if __name__ == '__main__':
    print PropertiesReader('/path/to/file.properties').read_properties()

Question 46

i have used this, this library is very useful

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print(p)
print(p.items())
print(p['name3'])
p['name3'] = 'changed = value'

Question 47

This is what I’m doing in my project: I just create another .py file called properties.py which includes all common variables/properties I used in the project, and in any file need to refer to these variables, put

from properties import *(or anything you need)

Used this method to keep svn peace when I was changing dev locations frequently and some common variables were quite relative to local environment. Works fine for me but not sure this method would be suggested for formal dev environment etc.

Question 48

import json
f=open('test.json')
x=json.load(f)
f.close()
print(x)

Contents of test.json: {“host”: “127.0.0.1”, “user”: “jms”}

Question 49

I have created a python module that is almost similar to the Properties class of Java ( Actually it is like the PropertyPlaceholderConfigurer in spring which lets you use ${variable-reference} to refer to already defined property )

EDIT : You may install this package by running the command(currently tested for python 3).
pip install property

The project is hosted on GitHub

Example : ( Detailed documentation can be found here )

Let’s say you have the following properties defined in my_file.properties file

foo = I am awesome
bar = ${chocolate}-bar
chocolate = fudge

Code to load the above properties

from properties.p import Property

prop = Property()
# Simply load it into a dictionary
dic_prop = prop.load_property_files('my_file.properties')

Question 50

If you need to read all values from a section in properties file in a simple manner:

Your config.properties file layout :

[SECTION_NAME]  
key1 = value1  
key2 = value2

You code:

   import configparser

   config = configparser.RawConfigParser()
   config.read('path_to_config.properties file')

   details_dict = dict(config.items('SECTION_NAME'))

This will give you a dictionary where keys are same as in config file and their corresponding values.

details_dict is :

{'key1':'value1', 'key2':'value2'}

Now to get key1’s value : details_dict['key1']

Putting it all in a method which reads that section from config file only once(the first time the method is called during a program run).

def get_config_dict():
    if not hasattr(get_config_dict, 'config_dict'):
        get_config_dict.config_dict = dict(config.items('SECTION_NAME'))
    return get_config_dict.config_dict

Now call the above function and get the required key’s value :

config_details = get_config_dict()
key_1_value = config_details['key1']

————————————————————-

Extending the approach mentioned above, reading section by section automatically and then accessing by section name followed by key name.

def get_config_section():
    if not hasattr(get_config_section, 'section_dict'):
        get_config_section.section_dict = dict()

        for section in config.sections():
            get_config_section.section_dict[section] = 
                             dict(config.items(section))

    return get_config_section.section_dict

To access:

config_dict = get_config_section()

port = config_dict['DB']['port']

(here ‘DB’ is a section name in config file and ‘port’ is a key under section ‘DB’.)

Question 51

Below 2 lines of code shows how to use Python List Comprehension to load ‘java style’ property file.

split_properties=[line.split("=") for line in open('/<path_to_property_file>)]
properties={key: value for key,value in split_properties }

Please have a look at below post for details https://ilearnonlinesite.wordpress.com/2017/07/24/reading-property-file-in-python-using-comprehension-and-generators/

Question 52

you can use parameter “fromfile_prefix_chars” with argparse to read from config file as below—

temp.py

parser = argparse.ArgumentParser(fromfile_prefix_chars='#')
parser.add_argument('--a')
parser.add_argument('--b')
args = parser.parse_args()
print(args.a)
print(args.b)

config file

--a
hello
--b
hello dear

Run command

python temp.py "#config"

Question 53

I did this using ConfigParser as follows. The code assumes that there is a file called config.prop in the same directory where BaseTest is placed:

config.prop

[CredentialSection]
app.name=MyAppName

BaseTest.py:

import unittest
import ConfigParser

class BaseTest(unittest.TestCase):
    def setUp(self):
        __SECTION = 'CredentialSection'
        config = ConfigParser.ConfigParser()
        config.readfp(open('config.prop'))
        self.__app_name = config.get(__SECTION, 'app.name')

    def test1(self):
        print self.__app_name % This should print: MyAppName

Question 54

This is what i had written to parse file and set it as env variables which skips comments and non key value lines added switches to specify hg:d

-h or –help print usage summary
-c Specify char that identifies comment
-s Separator between key and value in prop file

and specify properties file that needs to be parsed eg : python EnvParamSet.py -c # -s = env.properties

import pipes
import sys , getopt
import os.path

class Parsing :

        def __init__(self , seprator , commentChar , propFile):
        self.seprator = seprator
        self.commentChar = commentChar
        self.propFile  = propFile

    def  parseProp(self):
        prop = open(self.propFile,'rU')
        for line in prop :
            if line.startswith(self.commentChar)==False and  line.find(self.seprator) != -1  :
                keyValue = line.split(self.seprator)
                key =  keyValue[0].strip() 
                value = keyValue[1].strip() 
                        print("export  %s=%s" % (str (key),pipes.quote(str(value))))




class EnvParamSet:

    def main (argv):

        seprator = '='
        comment =  '#'

        if len(argv)  is 0:
            print "Please Specify properties file to be parsed "
            sys.exit()
        propFile=argv[-1] 


        try :
            opts, args = getopt.getopt(argv, "hs:c:f:", ["help", "seprator=","comment=", "file="])
        except getopt.GetoptError,e:
            print str(e)
            print " possible  arguments  -s <key value sperator > -c < comment char >    <file> \n  Try -h or --help "
            sys.exit(2)


        if os.path.isfile(args[0])==False:
            print "File doesnt exist "
            sys.exit()


        for opt , arg  in opts :
            if opt in ("-h" , "--help"):
                print " hg:d  \n -h or --help print usage summary \n -c Specify char that idetifes comment  \n -s Sperator between key and value in prop file \n  specify file  "
                sys.exit()
            elif opt in ("-s" , "--seprator"):
                seprator = arg 
            elif opt in ("-c"  , "--comment"):
                comment  = arg

        p = Parsing( seprator, comment , propFile)
        p.parseProp()

    if __name__ == "__main__":
            main(sys.argv[1:])

Question 55

Lightbend has released the Typesafe Config library, which parses properties files and also some JSON-based extensions. Lightbend’s library is only for the JVM, but it seems to be widely adopted and there are now ports in many languages, including Python: https://github.com/chimpler/pyhocon

Question 56

You can use the following function, which is the modified code of @mvallebr. It respects the properties file comments, ignores empty new lines, and allows retrieving a single key value.

def getProperties(propertiesFile ="/home/memin/.config/customMemin/conf.properties", key=''):
    """
    Reads a .properties file and returns the key value pairs as dictionary.
    if key value is specified, then it will return its value alone.
    """
    with open(propertiesFile) as f:
        l = [line.strip().split("=") for line in f.readlines() if not line.startswith('#') and line.strip()]
        d = {key.strip(): value.strip() for key, value in l}

        if key:
            return d[key]
        else:
            return d

Question 57

this works for me.

from pyjavaproperties import Properties
p = Properties()
p.load(open('test.properties'))
p.list()
print p
print p.items()
print p['name3']

Question 58

I followed configparser approach and it worked quite well for me. Created one PropertyReader file and used config parser there to ready property to corresponding to each section.

**Used Python 2.7

Content of PropertyReader.py file:

#!/usr/bin/python
import ConfigParser

class PropertyReader:

def readProperty(self, strSection, strKey):
    config = ConfigParser.RawConfigParser()
    config.read('ConfigFile.properties')
    strValue = config.get(strSection,strKey);
    print "Value captured for "+strKey+" :"+strValue
    return strValue

Content of read schema file:

from PropertyReader import *

class ReadSchema:

print PropertyReader().readProperty('source1_section','source_name1')
print PropertyReader().readProperty('source2_section','sn2_sc1_tb')

Content of .properties file:

[source1_section]
source_name1:module1
sn1_schema:schema1,schema2,schema3
sn1_sc1_tb:employee,department,location
sn1_sc2_tb:student,college,country

[source2_section]
source_name1:module2
sn2_schema:schema4,schema5,schema6
sn2_sc1_tb:employee,department,location
sn2_sc2_tb:student,college,country

Question 59

create a dictionary in your python module and store everything into it and access it, for example:

dict = {
       'portalPath' : 'www.xyx.com',
       'elementID': 'submit'}

Now to access it you can simply do:

submitButton = driver.find_element_by_id(dict['elementID'])

Question 60

In pre-historic times (Python 1.4) we did:

fp = open('filename.txt')
while 1:
    line = fp.readline()
    if not line:
        break
    print line

after Python 2.1, we did:

for line in open('filename.txt').xreadlines():
    print line

before we got the convenient iterator protocol in Python 2.3, and could do:

for line in open('filename.txt'):
    print line

I’ve seen some examples using the more verbose:

with open('filename.txt') as fp:
    for line in fp:
        print line

is this the preferred method going forwards?

[edit] I get that the with statement ensures closing of the file… but why isn’t that included in the iterator protocol for file objects?

Question 61

There is exactly one reason why the following is preferred:

with open('filename.txt') as fp:
    for line in fp:
        print line

We are all spoiled by CPython’s relatively deterministic reference-counting scheme for garbage collection. Other, hypothetical implementations of Python will not necessarily close the file “quickly enough” without the with block if they use some other scheme to reclaim memory.

In such an implementation, you might get a “too many files open” error from the OS if your code opens files faster than the garbage collector calls finalizers on orphaned file handles. The usual workaround is to trigger the GC immediately, but this is a nasty hack and it has to be done by every function that could encounter the error, including those in libraries. What a nightmare.

Or you could just use the with block.

Bonus Question

(Stop reading now if are only interested in the objective aspects of the question.)

Why isn’t that included in the iterator protocol for file objects?

This is a subjective question about API design, so I have a subjective answer in two parts.

On a gut level, this feels wrong, because it makes iterator protocol do two separate things—iterate over lines and close the file handle—and it’s often a bad idea to make a simple-looking function do two actions. In this case, it feels especially bad because iterators relate in a quasi-functional, value-based way to the contents of a file, but managing file handles is a completely separate task. Squashing both, invisibly, into one action, is surprising to humans who read the code and makes it more difficult to reason about program behavior.

Other languages have essentially come to the same conclusion. Haskell briefly flirted with so-called “lazy IO” which allows you to iterate over a file and have it automatically closed when you get to the end of the stream, but it’s almost universally discouraged to use lazy IO in Haskell these days, and Haskell users have mostly moved to more explicit resource management like Conduit which behaves more like the with block in Python.

On a technical level, there are some things you may want to do with a file handle in Python which would not work as well if iteration closed the file handle. For example, suppose I need to iterate over the file twice:

with open('filename.txt') as fp:
    for line in fp:
        ...
    fp.seek(0)
    for line in fp:
        ...

While this is a less common use case, consider the fact that I might have just added the three lines of code at the bottom to an existing code base which originally had the top three lines. If iteration closed the file, I wouldn’t be able to do that. So keeping iteration and resource management separate makes it easier to compose chunks of code into a larger, working Python program.

Composability is one of the most important usability features of a language or API.

Question 62

Yes,

with open('filename.txt') as fp:
    for line in fp:
        print line

is the way to go.

It is not more verbose. It is more safe.

Question 63

if you’re turned off by the extra line, you can use a wrapper function like so:

def with_iter(iterable):
    with iterable as iter:
        for item in iter:
            yield item

for line in with_iter(open('...')):
    ...

in Python 3.3, the yield from statement would make this even shorter:

def with_iter(iterable):
    with iterable as iter:
        yield from iter

Question 64

f = open('test.txt','r')
for line in f.xreadlines():
    print line
f.close()

问题：给定一百万个数字的字符串，返回所有重复的3位数字

回答 0

回答 1

回答 2

并行执行版本

Parallel execution version

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：在Python中转义HTML的最简单方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

cgi.escape 扩展的

例如

cgi.escape extended

for example

回答 7

回答 8

问题：创建简单的python网络服务的最佳方法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：python中的属性文件（类似于Java属性）

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

————————————————– ———–

————————————————————-

回答 15

回答 16

回答 17

回答 18

回答 19

回答 20

回答 21

回答 22

回答 23

问题：如何在Python中逐行读取文件？

回答 0

奖金问题

Bonus Question

回答 1

回答 2

回答 3

问题：我可以使用`pip`代替`easy_install`来实现`python setup.py install`依赖关系解析吗？

回答 0

从网络上的tarball安装

从本地tarball安装

从本地文件夹安装

从本地文件夹安装（可编辑模式）

`cgi.escape` 扩展的

`cgi.escape` extended