The first line of the Rationale section of PEP 338 says:
Python 2.4 adds the command line switch -m to allow modules to be located using the Python module namespace for execution as scripts. The motivating examples were standard library modules such as pdb and profile, and the Python 2.4 implementation is fine for this limited purpose.
So you can specify any module in Python’s search path this way, not just files in the current directory. You’re correct that python mymod1.py mymod2.py args has exactly the same effect. The first line of the Scope of this proposal section states:
In Python 2.4, a module located using -m is executed just as if its filename had been provided on the command line.
With -m more is possible, like working with modules which are part of a package, etc. That’s what the rest of PEP 338 is about. Read it for more info.
Despite this question having been asked and answered several times (e.g., here, here, here, and here) in my opinion no existing answer fully or concisely captures all the implications of the -m flag. Therefore, the following will attempt to improve on what has come before.
Introduction (TLDR)
The -m flag does a lot of things, not all of which will be needed all the time. In short it can be used to: (1) execute python code from the command line via modulename rather than filename (2) add a directory to sys.path for use in import resolution and (3) execute python code that contains relative imports from the command line.
Preliminaries
To explain the -m flag we first need to explain a little terminology.
Python’s primary organizational unit is known as a module. Module’s come in one of two flavors: code modules and package modules. A code module is any file that contains python executable code. A package module is a directory that contains other modules (either code modules or package modules). The most common type of code modules are *.py files while the most common type of package modules are directories containing an __init__.py file.
Python allows modules to be uniquely identified in two distinct ways: modulename and filename. In general, modules are identified by modulename in Python code (e.g., import <modulename>) and by filename on the command line (e.g., python <filename>). All python interpreters are able to convert modulenames to filenames by following the same few, well-defined rules. These rules hinge on the sys.path variable. By altering this variable one can change how Python resolves modulenames into filenames (for more on how this is done see PEP 302).
All modules (both code and package) can be executed (i.e., code associated with the module will be evaluated by the Python interpreter). Depending on the execution method (and module type) what code gets evaluated, and when, can change quite a bit. For example, if one executes a package module via python <filename> then <filename>/__init__.py will be evaluated followed by <filename>/__main__.py. On the other hand, if one executes that same package module via import <modulename> then only the package’s __init__.py will be executed.
Historical Development of -m
The -m flag was first introduced in Python 2.4.1. Initially its only purpose was to provide an alternative means of identifying the python module to execute from the command line. That is, if we knew both the <filename> and <modulename> for a module then the following two commands were equivalent: python <filename> <args> and python -m <modulename> <args>. One constraint with this iteration, according to PEP 338, was that -m only worked with top level modulenames (i.e., modules that could be found directly on sys.path without any intervening package modules).
With the completion of PEP 338 the -m feature was extended to support <modulename> representations beyond the top level. This meant names such as http.server were now fully supported. This extension also meant that each parent package in modulename was now evaluated (i.e., all parent package __init__.py files were evaluated) in addition to the module referenced by the modulename itself.
The final major feature enhancement for -m came with PEP 366. With this upgrade -m gained the ability to support not only absolute imports but also explicit relative imports when executing modules. This was achieved by changing -m so that it set the __package__ variable to the parent module of the given modulename (in addition to everything else it already did).
Use Cases
There are two notable use cases for the -m flag:
To execute modules from the command line for which one may not know their filename. This use case takes advantage of the fact that the Python interpreter knows how to convert modulenames to filenames. This is particularly advantageous when one wants to run stdlib modules or 3rd-party module from the command line. For example, very few people know the filename for the http.server module but most people do know its modulename so we can execute it from the command line using python -m http.server.
To execute a local package containing absolute or relative imports without needing to install it. This use case is detailed in PEP 338 and leverages the fact that the current working directory is added to sys.path rather than the module’s directory. This use case is very similar to using pip install -e . to install a package in develop/edit mode.
Shortcomings
With all the enhancements made to -m over the years it still has one major shortcoming — it can only execute modules written in Python (i.e., *.py). For example, if -m is used to execute a C compiled code module the following error will be produced, No code object available for <modulename> (see here for more details).
Detailed Comparisons
Effects of module execution via import statement (i.e., import <modulename>):
sys.path is not modified in any way
__name__ is set to the absolute form of <modulename>
__package__ is set to the immediate parent package in <modulename>
__init__.py is evaluated for all packages (including its own for package modules)
__main__.py is not evaluated for package modules; the code is evaluated for code modules
Effects of module execution via command line (i.e., python <filename>):
sys.path is modified to include the final directory in <filename>
__name__ is set to '__main__'
__package__ is set to None
__init__.py is not evaluated for any package (including its own for package modules)
__main__.py is evaluated for package modules; the code is evaluated for code modules.
Effects of module execution via command line with the -m flag (i.e., python -m <modulename>):
sys.path is modified to include the current directory
__name__ is set to '__main__'
__package__ is set to the immediate parent package in <modulename>
__init__.py is evaluated for all packages (including its own for package modules)
__main__.py is evaluated for package modules; the code is evaluated for code modules
Conclusion
The -m flag is, at its simplest, a means to execute python scripts from the command line by using modulenames rather than filenames. The real power of -m, however, is in its ability to combine the power of import statements (e.g., support for explicit relative imports and automatic package __init__ evaluation) with the convenience of the command line.
I have a Python program I’m building that can be run in either of 2 ways: the first is to call “python main.py” which prompts the user for input in a friendly manner and then runs the user input through the program. The other way is to call “python batch.py -file-” which will pass over all the friendly input gathering and run an entire file’s worth of input through the program in a single go.
The problem is that when I run “batch.py” it imports some variables/methods/etc from “main.py”, and when it runs this code:
import main
at the first line of the program, it immediately errors because it tries to run the code in “main.py”.
How can I stop Python from running the code contained in the “main” module which I’m importing?
Because this is just how Python works – keywords such as class and def are not declarations. Instead, they are real live statements which are executed. If they were not executed your module would be .. empty :-)
Anyway, the idiomatic approach is:
# stuff to run always here such as class/def
def main():
pass
if __name__ == "__main__":
# stuff only to run when not called via 'import' here
main()
Due to the way Python works, it is necessary for it to run your modules when it imports them.
To prevent code in the module from being executed when imported, but only when run directly, you can guard it with this if:
if __name__ == "__main__":
# this won't be run when imported
You may want to put this code in a main() method, so that you can either execute the file directly, or import the module and call the main(). For example, assume this is in the file foo.py.
def main():
print "Hello World"
if __name__ == "__main__":
main()
This program can be run either by going python foo.py, or from another Python script:
Use the if __name__ == '__main__' idiom — __name__ is a special variable whose value is '__main__' if the module is being run as a script, and the module name if it’s imported. So you’d do something like
# imports
# class/function definitions
if __name__ == '__main__':
# code here will only run when you invoke 'python main.py'
Unfortunately, you don’t. That is part of how the import syntax works and it is important that it does so — remember def is actually something executed, if Python did not execute the import, you’d be, well, stuck without functions.
Since you probably have access to the file, though, you might be able to look and see what causes the error. It might be possible to modify your environment to prevent the error from happening.
Put the code inside a function and it won’t run until you call the function. You should have a main function in your main.py. with the statement:
if __name__ == '__main__':
main()
Then, if you call python main.py the main() function will run. If you import main.py, it will not. Also, you should probably rename main.py to something else for clarity’s sake.
There was a Python enhancement proposal PEP 299 which aimed to replace if __name__ == '__main__': idiom with def __main__:, but it was rejected. It’s still a good read to know what to keep in mind when using if __name__ = '__main__':.
Although you cannot use import without running the code; there is quite a swift way in which you can input your variables; by using numpy.savez, which stores variables as numpy arrays in a .npz file. Afterwards you can load the variables using numpy.load.
Try just importing the functions needed from main.py? So,
from main import SomeFunction
It could be that you’ve named a function in batch.py the same as one in main.py, and when you import main.py the program runs the main.py function instead of the batch.py function; doing the above should fix that. I hope.
I have a file with some probabilities for different values e.g.:
1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2
I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It’s fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.
I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard random module).
scipy.stats.rv_discrete might be what you want. You can supply your probabilities via the values parameter. You can then use the rvs() method of the distribution object to generate random numbers.
As pointed out by Eugene Pakhomov in the comments, you can also pass a p keyword parameter to numpy.random.choice(), e.g.
Since Python 3.6, there’s a solution for this in Python’s standard library, namely random.choices.
Example usage: let’s set up a population and weights matching those in the OP’s question:
>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
Now choices(population, weights) generates a single sample:
>>> choices(population, weights)
4
The optional keyword-only argument k allows one to request more than one sample at once. This is valuable because there’s some preparatory work that random.choices has to do every time it’s called, prior to generating any samples; by generating many samples at once, we only have to do that preparatory work once. Here we generate a million samples, and use collections.Counter to check that the distribution we get roughly matches the weights we gave.
def random_distr(l):
r = random.uniform(0,1)
s =0for item, prob in l:
s += prob
if s >= r:return item
return item # Might occur because of floating point inaccuracies
An advantage to generating the list using CDF is that you can use binary search. While you need O(n) time and space for preprocessing, you can get k numbers in O(k log n). Since normal Python lists are inefficient, you can use array module.
If you insist on constant space, you can do the following; O(n) time, O(1) space.
def random_distr(l):
r = random.uniform(0, 1)
s = 0
for item, prob in l:
s += prob
if s >= r:
return item
return item # Might occur because of floating point inaccuracies
Maybe it is kind of late. But you can use numpy.random.choice(), passing the p parameter:
val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])
回答 4
(好吧,我知道您正在要求收缩包装,但是也许这些自制的解决方案还不够简洁,无法满足您的喜好。:-)
pdf =[(1,0.1),(2,0.05),(3,0.05),(4,0.2),(5,0.4),(6,0.2)]
cdf =[(i, sum(p for j,p in pdf if j < i))for i,_ in pdf]
R = max(i for r in[random.random()]for i,c in cdf if c <= r)
我通过确认此表达式的输出来伪确认此方法有效:
sorted(max(i for r in[random.random()]for i,c in cdf if c <= r)for _ in range(1000))
(OK, I know you are asking for shrink-wrap, but maybe those home-grown solutions just weren’t succinct enough for your liking. :-)
pdf = [(1, 0.1), (2, 0.05), (3, 0.05), (4, 0.2), (5, 0.4), (6, 0.2)]
cdf = [(i, sum(p for j,p in pdf if j < i)) for i,_ in pdf]
R = max(i for r in [random.random()] for i,c in cdf if c <= r)
I pseudo-confirmed that this works by eyeballing the output of this expression:
sorted(max(i for r in [random.random()] for i,c in cdf if c <= r)
for _ in range(1000))
import numpy as np
#funtiondef random_custDist(x0,x1,custDist,size=None, nControl=10**6):#genearte a list of size random samples, obeying the distribution custDist#suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)#custDist noes not need to be normalized. Add this condition to increase performance. #Best performance for max_{x in [x0,x1]} custDist(x) = 1
samples=[]
nLoop=0while len(samples)<size and nLoop<nControl:
x=np.random.uniform(low=x0,high=x1)
prop=custDist(x)assert prop>=0and prop<=1if np.random.uniform(low=0,high=1)<=prop:
samples +=[x]
nLoop+=1return samples
#call
x0=2007
x1=2019def custDist(x):if x<2010:return.3else:return(np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)print(samples)#plotimport matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar((bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x)for x in grid])#distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
I wrote a solution for drawing random samples from a custom continuous distribution.
I needed this for a similar use-case to yours (i.e. generating random dates with a given probability distribution).
You just need the funtion random_custDist and the line samples=random_custDist(x0,x1,custDist=custDist,size=1000). The rest is decoration ^^.
import numpy as np
#funtion
def random_custDist(x0,x1,custDist,size=None, nControl=10**6):
#genearte a list of size random samples, obeying the distribution custDist
#suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)
#custDist noes not need to be normalized. Add this condition to increase performance.
#Best performance for max_{x in [x0,x1]} custDist(x) = 1
samples=[]
nLoop=0
while len(samples)<size and nLoop<nControl:
x=np.random.uniform(low=x0,high=x1)
prop=custDist(x)
assert prop>=0 and prop<=1
if np.random.uniform(low=0,high=1) <=prop:
samples += [x]
nLoop+=1
return samples
#call
x0=2007
x1=2019
def custDist(x):
if x<2010:
return .3
else:
return (np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)
print(samples)
#plot
import matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar( (bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')
#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x) for x in grid]) #distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)
#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()
The performance of this solution is improvable for sure, but I prefer readability.
回答 6
根据以下内容列出项目weights:
items =[1,2,3,4,5,6]
probabilities=[0.1,0.05,0.05,0.2,0.4,0.2]# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities)# find sum of probs, to normalize them
c =(1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)print probabilities
ml = max(probabilities, key=lambda x: len(str(x))- str(x).find('.'))
ml = len(str(ml))- str(ml).find('.')-1
amounts =[ int(x*(10**ml))for x in probabilities]
itemsList = list()for i in range(0, len(items)):# iterate through original items
itemsList += items[i:i+1]*amounts[i]# choose from itemsList randomlyprint itemsList
items = [1, 2, 3, 4, 5, 6]
probabilities= [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities) # find sum of probs, to normalize them
c = (1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)
print probabilities
ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)): # iterate through original items
itemsList += items[i:i+1]*amounts[i]
# choose from itemsList randomly
print itemsList
An optimization may be to normalize amounts by the greatest common divisor, to make the target list smaller.
distribution =[(1,0.2),(2,0.3),(3,0.5)]# init distribution
dlist =[]
sumchance =0for value, chance in distribution:
sumchance += chance
dlist.append((value, sumchance))assert sumchance ==1.0# not good assert because of float equality # get random value
r = random.random()# for small distributions use lineair search if len(distribution)<64:# don't know exact speed limit for value, sumchance in dlist:if r < sumchance:return value
else:# else (not implemented) binary search algorithm
distribution = [(1, 0.2), (2, 0.3), (3, 0.5)]
# init distribution
dlist = []
sumchance = 0
for value, chance in distribution:
sumchance += chance
dlist.append((value, sumchance))
assert sumchance == 1.0 # not good assert because of float equality
# get random value
r = random.random()
# for small distributions use lineair search
if len(distribution) < 64: # don't know exact speed limit
for value, sumchance in dlist:
if r < sumchance:
return value
else:
# else (not implemented) binary search algorithm
回答 8
from __future__ import division
import random
from collections importCounterdef num_gen(num_probs):# calculate minimum probability to normalize
min_prob = min(prob for num, prob in num_probs)
lst =[]for num, prob in num_probs:# keep appending num to lst, proportional to its probability in the distributionfor _ in range(int(prob/min_prob)):
lst.append(num)# all elems in lst occur proportional to their distribution probablitieswhileTrue:# pick a random index from lst
ind = random.randint(0, len(lst)-1)yield lst[ind]
验证:
gen = num_gen([(1,0.1),(2,0.05),(3,0.05),(4,0.2),(5,0.4),(6,0.2)])
lst =[]
times =10000for _ in range(times):
lst.append(next(gen))# Verify the created distribution:for item, count inCounter(lst).iteritems():print'%d has %f probability'%(item, count/times)1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability
from __future__ import division
import random
from collections import Counter
def num_gen(num_probs):
# calculate minimum probability to normalize
min_prob = min(prob for num, prob in num_probs)
lst = []
for num, prob in num_probs:
# keep appending num to lst, proportional to its probability in the distribution
for _ in range(int(prob/min_prob)):
lst.append(num)
# all elems in lst occur proportional to their distribution probablities
while True:
# pick a random index from lst
ind = random.randint(0, len(lst)-1)
yield lst[ind]
Verification:
gen = num_gen([(1, 0.1),
(2, 0.05),
(3, 0.05),
(4, 0.2),
(5, 0.4),
(6, 0.2)])
lst = []
times = 10000
for _ in range(times):
lst.append(next(gen))
# Verify the created distribution:
for item, count in Counter(lst).iteritems():
print '%d has %f probability' % (item, count/times)
1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability
回答 9
根据其他解决方案,您可以生成累积分布(任意形式为整数或浮点数),然后可以使用二等分来使其快速
这是一个简单的示例(我在这里使用了整数)
l=[(20,'foo'),(60,'banana'),(10,'monkey'),(10,'monkey2')]def get_cdf(l):
ret=[]
c=0for i in l: c+=i[0]; ret.append((c, i[1]))return ret
def get_random_item(cdf):return cdf[bisect.bisect_left(cdf,(random.randint(0, cdf[-1][0]),))][1]
cdf=get_cdf(l)for i in range(100):print get_random_item(cdf),
based on other solutions, you generate accumulative distribution (as integer or float whatever you like), then you can use bisect to make it fast
this is a simple example (I used integers here)
l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
ret=[]
c=0
for i in l: c+=i[0]; ret.append((c, i[1]))
return ret
def get_random_item(cdf):
return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]
cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),
the get_cdf function would convert it from 20, 60, 10, 10 into 20, 20+60, 20+60+10, 20+60+10+10
now we pick a random number up to 20+60+10+10 using random.randint then we use bisect to get the actual value in a fast way
def accumulate_normalize_values(p):
pi = p.items()if isinstance(p,dict)else p
accum_pi =[]
accum =0for i in pi:
accum_pi.append((i[0],i[1]+accum))
accum += i[1]if accum ==0:raiseException("You are about to explode the universe. Continue ? Y/N ")
normed_a =[]for a in accum_pi:
normed_a.append((a[0],a[1]*1.0/accum))return normed_a
def select(symbol_intervals,random):print symbol_intervals,random
i =0while random > symbol_intervals[i][1]:
i +=1if i >= len(symbol_intervals):raiseException("What did you DO to that poor list?")return symbol_intervals[i][0]def gen_random(alphabet,length,probabilities=None):from random import random
from itertools import repeat
if probabilities isNone:
probabilities = dict(zip(alphabet,repeat(1.0)))elif len(probabilities)>0and isinstance(probabilities[0],(int,long,float)):
probabilities = dict(zip(alphabet,probabilities))#ordered
usable_probabilities = accumulate_normalize_values(probabilities)
gen =[]while len(gen)< length:
gen.append(select(usable_probabilities,random()))return gen
用法:
>>> gen_random (['a','b','c','d'],10,[100,300,400,200])['d','b','b','a','c','c','b','c','c','c']#<--- some of the time
None of these answers is particularly clear or simple.
Here is a clear, simple method that is guaranteed to work.
accumulate_normalize_probabilities takes a dictionary p that maps symbols to probabilities OR frequencies. It outputs usable list of tuples from which to do selection.
def accumulate_normalize_values(p):
pi = p.items() if isinstance(p,dict) else p
accum_pi = []
accum = 0
for i in pi:
accum_pi.append((i[0],i[1]+accum))
accum += i[1]
if accum == 0:
raise Exception( "You are about to explode the universe. Continue ? Y/N " )
normed_a = []
for a in accum_pi:
normed_a.append((a[0],a[1]*1.0/accum))
return normed_a
The accumulation step turns each symbol into an interval between itself and the previous symbols probability or frequency (or 0 in the case of the first symbol). These intervals can be used to select from (and thus sample the provided distribution) by simply stepping through the list until the random number in interval 0.0 -> 1.0 (prepared earlier) is less or equal to the current symbol’s interval end-point.
The normalization releases us from the need to make sure everything sums to some value. After normalization the “vector” of probabilities sums to 1.0.
The rest of the code for selection and generating a arbitrarily long sample from the distribution is below :
def select(symbol_intervals,random):
print symbol_intervals,random
i = 0
while random > symbol_intervals[i][1]:
i += 1
if i >= len(symbol_intervals):
raise Exception( "What did you DO to that poor list?" )
return symbol_intervals[i][0]
def gen_random(alphabet,length,probabilities=None):
from random import random
from itertools import repeat
if probabilities is None:
probabilities = dict(zip(alphabet,repeat(1.0)))
elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
probabilities = dict(zip(alphabet,probabilities)) #ordered
usable_probabilities = accumulate_normalize_values(probabilities)
gen = []
while len(gen) < length:
gen.append(select(usable_probabilities,random()))
return gen
Usage :
>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c'] #<--- some of the time
def resample(weights, n):
beta =0# Caveat: Assign max weight to max*2 for best results
max_w = max(weights)*2# Pick an item uniformly at random, to start with
current_item = random.randint(0,n-1)
result =[]for i in range(n):
beta += random.uniform(0,max_w)while weights[current_item]< beta:
beta -= weights[current_item]
current_item =(current_item +1)% n # cyclicelse:
result.append(current_item)return result
Just call the following function with your ‘weights’ array (assuming the indices as the corresponding items) and the no. of samples needed. This function can be easily modified to handle ordered pair.
Returns indexes (or items) sampled/picked (with replacement) using their respective probabilities:
def resample(weights, n):
beta = 0
# Caveat: Assign max weight to max*2 for best results
max_w = max(weights)*2
# Pick an item uniformly at random, to start with
current_item = random.randint(0,n-1)
result = []
for i in range(n):
beta += random.uniform(0,max_w)
while weights[current_item] < beta:
beta -= weights[current_item]
current_item = (current_item + 1) % n # cyclic
else:
result.append(current_item)
return result
A short note on the concept used in the while loop.
We reduce the current item’s weight from cumulative beta, which is a cumulative value constructed uniformly at random, and increment current index in order to find the item, the weight of which matches the value of beta.
I want to define a constant that should be available in all of the submodules of a package. I’ve thought that the best place would be in in the __init__.py file of the root package. But I don’t know how to do this. Suppose I have a few subpackages and each with several modules. How can I access that variable from these modules?
Of course, if this is totally wrong, and there is a better alternative, I’d like to know it.
回答 0
您应该能够将它们放入__init__.py。这一直都在做。
mypackage/__init__.py:
MY_CONSTANT =42
mypackage/mymodule.py:
from mypackage import MY_CONSTANT
print"my constant is", MY_CONSTANT
然后,导入mymodule:
>>>from mypackage import mymodule
my constant is42
You should be able to put them in __init__.py. This is done all the time.
mypackage/__init__.py:
MY_CONSTANT = 42
mypackage/mymodule.py:
from mypackage import MY_CONSTANT
print "my constant is", MY_CONSTANT
Then, import mymodule:
>>> from mypackage import mymodule
my constant is 42
Still, if you do have constants, it would be reasonable (best practices, probably) to put them in a separate module (constants.py, config.py, …) and then if you want them in the package namespace, import them.
mypackage/__init__.py:
from mypackage.constants import *
Still, this doesn’t automatically include the constants in the namespaces of the package modules. Each of the modules in the package will still have to import constants explicitly either from mypackage or from mypackage.constants.
You cannot do that. You will have to explicitely import your constants into each individual module’s namespace. The best way to achieve this is to define your constants in a “config” module and import it everywhere you require it:
You can define global variables from anywhere, but it is a really bad idea. import the __builtin__ module and modify or add attributes to this modules, and suddenly you have new builtin constants or functions. In fact, when my application installs gettext, I get the _() function in all my modules, without importing anything. So this is possible, but of course only for Application-type projects, not for reusable packages or modules.
And I guess no one would recommend this practice anyway. What’s wrong with a namespace? Said application has the version module, so that I have “global” variables available like version.VERSION, version.PACKAGE_NAME etc.
Just wanted to add that constants can be employed using a config.ini file and parsed in the script using the configparser library. This way you could have constants for multiple circumstances. For instance if you had parameter constants for two separate url requests just label them like so:
I found the documentation on the Python website very helpful. I am not sure if there are any differences between Python 2 and 3 so here are the links to both:
class A(object):def salutation(self, accusative):print"hello", accusative
# note this function is intentionally on the module, and not the class abovedef __getattr__(mod, name):return getattr(A(), name)if __name__ =="__main__":# i hope here to have my __getattr__ function above invoked, since# salutation does not exist in the current namespace
salutation("world")
这使:
matt@stanley:~/Desktop$ python getattrmod.py
Traceback(most recent call last):File"getattrmod.py", line 9,in<module>
salutation("world")NameError: name 'salutation'isnot defined
How can implement the equivalent of a __getattr__ on a class, on a module?
Example
When calling a function that does not exist in a module’s statically defined attributes, I wish to create an instance of a class in that module, and invoke the method on it with the same name as failed in the attribute lookup on the module.
class A(object):
def salutation(self, accusative):
print "hello", accusative
# note this function is intentionally on the module, and not the class above
def __getattr__(mod, name):
return getattr(A(), name)
if __name__ == "__main__":
# i hope here to have my __getattr__ function above invoked, since
# salutation does not exist in the current namespace
salutation("world")
Which gives:
matt@stanley:~/Desktop$ python getattrmod.py
Traceback (most recent call last):
File "getattrmod.py", line 9, in <module>
salutation("world")
NameError: name 'salutation' is not defined
Recently some historical features have made a comeback, the module __getattr__ among them, and so the existing hack (a module replacing itself with a class in sys.modules at import time) should be no longer necessary.
In Python 3.7+, you just use the one obvious way. To customize attribute access on a module, define a __getattr__ function at the module level which should accept one argument (name of attribute), and return the computed value or raise an AttributeError:
This will also allow hooks into “from” imports, i.e. you can return dynamically generated objects for statements such as from my_module import whatever.
On a related note, along with the module getattr you may also define a __dir__ function at module level to respond to dir(my_module). See PEP 562 for details.
回答 1
您在这里遇到两个基本问题:
__xxx__ 方法只在类上查找
TypeError: can't set attributes of built-in/extension type 'module'
There are two basic problems you are running into here:
__xxx__ methods are only looked up on the class
TypeError: can't set attributes of built-in/extension type 'module'
(1) means any solution would have to also keep track of which module was being examined, otherwise every module would then have the instance-substitution behavior; and (2) means that (1) isn’t even possible… at least not directly.
Fortunately, sys.modules is not picky about what goes there so a wrapper will work, but only for module access (i.e. import somemodule; somemodule.salutation('world'); for same-module access you pretty much have to yank the methods from the substitution class and add them to globals() eiher with a custom method on the class (I like using .export()) or with a generic function (such as those already listed as answers). One thing to keep in mind: if the wrapper is creating a new instance each time, and the globals solution is not, you end up with subtly different behavior. Oh, and you don’t get to use both at the same time — it’s one or the other.
There is actually a hack that is occasionally used and recommended: a
module can define a class with the desired functionality, and then at
the end, replace itself in sys.modules with an instance of that class
(or with the class, if you insist, but that’s generally less useful).
E.g.:
This works because the import machinery is actively enabling this
hack, and as its final step pulls the actual module out of
sys.modules, after loading it. (This is no accident. The hack was
proposed long ago and we decided we liked enough to support it in the
import machinery.)
So the established way to accomplish what you want is to create a single class in your module, and as the last act of the module replace sys.modules[__name__] with an instance of your class — and now you can play with __getattr__/__setattr__/__getattribute__ as needed.
Note 1: If you use this functionality then anything else in the module, such as globals, other functions, etc., will be lost when the sys.modules assignment is made — so make sure everything needed is inside the replacement class.
Note 2: To support from module import * you must have __all__ defined in the class; for example:
class A(object):
....
# The implicit global instance
a= A()
def salutation( *arg, **kw ):
a.salutation( *arg, **kw )
Why? So that the implicit global instance is visible.
For examples, look at the random module, which creates an implicit global instance to slightly simplify the use cases where you want a “simple” random number generator.
Similar to what @Håvard S proposed, in a case where I needed to implement some magic on a module (like __getattr__), I would define a new class that inherits from types.ModuleType and put that in sys.modules (probably replacing the module where my custom ModuleType was defined).
See the main __init__.py file of Werkzeug for a fairly robust implementation of this.
回答 5
这有点黑,但是…
import types
class A(object):def salutation(self, accusative):print"hello", accusative
def farewell(self, greeting, accusative):print greeting, accusative
defAddGlobalAttribute(classname, methodname):print"Adding "+ classname +"."+ methodname +"()"def genericFunction(*args):return globals()[classname]().__getattribute__(methodname)(*args)
globals()[methodname]= genericFunction
# set up the global namespace
x =0# X and Y are here to add them implicitly to globals, so
y =0# globals does not change as we iterate over it.
toAdd =[]def isCallableMethod(classname, methodname):
someclass = globals()[classname]()
something = someclass.__getattribute__(methodname)return callable(something)for x in globals():print"Looking at", x
if isinstance(globals()[x],(types.ClassType, type)):print"Found Class:", x
for y in dir(globals()[x]):if y.find("__")==-1:# hack to ignore default methodsif isCallableMethod(x,y):if y notin globals():# don't override existing global names
toAdd.append((x,y))for x in toAdd:AddGlobalAttribute(*x)if __name__ =="__main__":
salutation("world")
farewell("goodbye","world")
import types
class A(object):
def salutation(self, accusative):
print "hello", accusative
def farewell(self, greeting, accusative):
print greeting, accusative
def AddGlobalAttribute(classname, methodname):
print "Adding " + classname + "." + methodname + "()"
def genericFunction(*args):
return globals()[classname]().__getattribute__(methodname)(*args)
globals()[methodname] = genericFunction
# set up the global namespace
x = 0 # X and Y are here to add them implicitly to globals, so
y = 0 # globals does not change as we iterate over it.
toAdd = []
def isCallableMethod(classname, methodname):
someclass = globals()[classname]()
something = someclass.__getattribute__(methodname)
return callable(something)
for x in globals():
print "Looking at", x
if isinstance(globals()[x], (types.ClassType, type)):
print "Found Class:", x
for y in dir(globals()[x]):
if y.find("__") == -1: # hack to ignore default methods
if isCallableMethod(x,y):
if y not in globals(): # don't override existing global names
toAdd.append((x,y))
for x in toAdd:
AddGlobalAttribute(*x)
if __name__ == "__main__":
salutation("world")
farewell("goodbye", "world")
This works by iterating over the all the objects in the global namespace. If the item is a class, it iterates over the class attributes. If the attribute is callable it adds it to the global namespace as a function.
It ignore all attributes which contain “__”.
I wouldn’t use this in production code, but it should get you started.
Here’s my own humble contribution — a slight embellishment of @Håvard S’s highly rated answer, but a bit more explicit (so it might be acceptable to @S.Lott, even though probably not good enough for the OP):
Create your module file that has your classes. Import the module. Run getattr on the module you just imported. You can do a dynamic import using __import__ and pull the module from sys.modules.
I’m starting to learn python and loving it. I work on a Mac mainly as well as Linux. I’m finding that on Linux (Ubuntu 9.04 mostly) when I install a python module using apt-get it works fine. I can import it with no trouble.
On the Mac, I’m used to using Macports to install all the Unixy stuff. However, I’m finding that most of the python modules I install with it are not being seen by python. I’ve spent some time playing around with PATH settings and using python_select . Nothing has really worked and at this point I’m not really understanding, instead I’m just poking around.
I get the impression that Macports isn’t universally loved for managing python modules. I’d like to start fresh using a more “accepted” (if that’s the right word) approach.
So, I was wondering, what is the method that Mac python developers use to manage their modules?
Bonus questions:
Do you use Apple’s python, or some other version?
Do you compile everything from source or is there a package manger that works well (Fink?).
The most popular way to manage python packages (if you’re not using your system package manager) is to use setuptools and easy_install. It is probably already installed on your system. Use it like this:
easy_install django
easy_install uses the Python Package Index which is an amazing resource for python developers. Have a look around to see what packages are available.
A better option is pip, which is gaining traction, as it attempts to fix a lot of the problems associated with easy_install. Pip uses the same package repository as easy_install, it just works better. Really the only time use need to use easy_install is for this command:
easy_install pip
After that, use:
pip install django
At some point you will probably want to learn a bit about virtualenv. If you do a lot of python development on projects with conflicting package requirements, virtualenv is a godsend. It will allow you to have completely different versions of various packages, and switch between them easily depending your needs.
Regarding which python to use, sticking with Apple’s python will give you the least headaches, but If you need a newer version (Leopard is 2.5.1 I believe), I would go with the macports python 2.6.
回答 1
您的问题已经三岁了,有些其他答案未涵盖的细节:
我认识的大多数人都使用HomeBrew或MacPorts,因此我更喜欢MacPorts,因为它清晰地默认了Mac OS X环境和我的开发设置。只需移出/ opt文件夹并使用普通用户Python环境测试您的软件包即可
MacPorts仅可在Mac中移植,但是通过easy_install或pip,您将学习如何在任何平台(Win / Mac / Linux / Bsd …)上设置环境。此外,它将始终是最新的,并且具有更多的软件包
$ port select python
Available versions for python:
none
python25-apple
python26-apple
python27 (active)
python27-apple
python32
$ port select python python32
Your question is already three years old and there are some details not covered in other answers:
Most people I know use HomeBrew or MacPorts, I prefer MacPorts because of its clean cut of what is a default Mac OS X environment and my development setup. Just move out your /opt folder and test your packages with a normal user Python environment
MacPorts is only portable within Mac, but with easy_install or pip you will learn how to setup your environment in any platform (Win/Mac/Linux/Bsd…). Furthermore it will always be more up to date and with more packages
I personally let MacPorts handle my Python modules to keep everything updated. Like any other high level package manager (ie: apt-get) it is much better for the heavy lifting of modules with lots of binary dependencies. There is no way I would build my Qt bindings (PySide) with easy_install or pip. Qt is huge and takes a lot to compile. As soon as you want a Python package that needs a library used by non Python programs, try to avoid easy_install or pip
At some point you will find that there are some packages missing within MacPorts. I do not believe that MacPorts will ever give you the whole CheeseShop. For example, recently I needed the Elixir module, but MacPorts only offers py25-elixir and py26-elixir, no py27 version. In cases like these you have:
pip-2.7 install –user elixir
( make sure you always type pip-(version) )
That will build an extra Python library in your home dir. Yes, Python will work with more than one library location: one controlled by MacPorts and a user local one for everything missing within MacPorts.
Now notice that I favor pip over easy_install. There is a good reason you should avoid setuptools and easy_install. Here is a good explanation and I try to keep away from them. One very useful feature of pip is giving you a list of all the modules (along their versions) that you installed with MacPorts, easy_install and pip itself:
pip-2.7 freeze
If you already started using easy_install, don’t worry, pip can recognize everything done already by easy_install and even upgrade the packages installed with it.
If you are a developer keep an eye on virtualenv for controlling different setups and combinations of module versions. Other answers mention it already, what is not mentioned so far is the Tox module, a tool for testing that your package installs correctly with different Python versions.
Although I usually do not have version conflicts, I like to have virtualenv to set up a clean environment and get a clear view of my packages dependencies. That way I never forget any dependencies in my setup.py
If you go for MacPorts be aware that multiple versions of the same package are not selected anymore like the old Debian style with an extra python_select package (it is still there for compatibility). Now you have the select command to choose which Python version will be used (you can even select the Apple installed ones):
$ port select python
Available versions for python:
none
python25-apple
python26-apple
python27 (active)
python27-apple
python32
$ port select python python32
Add tox on top of it and your programs should be really portable
Please see Python OS X development environment. The best way is to use MacPorts. Download and install MacPorts, then install Python via MacPorts by typing the following commands in the Terminal:
sudo port install python26 python_select
sudo port select --set python python26
OR
sudo port install python30 python_select
sudo port select --set python python30
Use the first set of commands to install Python 2.6 and the second set to install Python 3.0. Then use:
sudo port install py26-packagename
OR
sudo port install py30-packagename
In the above commands, replace packagename with the name of the package, for example:
sudo port install py26-setuptools
These commands will automatically install the package (and its dependencies) for the given Python version.
For a full list of available packages for Python, type:
port list | grep py26-
OR
port list | grep py30-
Which command you use depends on which version of Python you chose to install.
I use MacPorts to install Python and any third-party modules tracked by MacPorts into /opt/local, and I install any manually installed modules (those not in the MacPorts repository) into /usr/local, and this has never caused any problems. I think you may be confused as to the use of certain MacPorts scripts and environment variables.
MacPorts python_select is used to select the “current” version of Python, but it has nothing to do with modules. This allows you to, e.g., install both Python 2.5 and Python 2.6 using MacPorts, and switch between installs.
The $PATH environment variables does not affect what Python modules are loaded. $PYTHONPATH is what you are looking for. $PYTHONPATH should point to directories containing Python modules you want to load. In my case, my $PYTHONPATH variable contains /usr/local/lib/python26/site-packages. If you use MacPorts’ Python, it sets up the other proper directories for you, so you only need to add additional paths to $PYTHONPATH. But again, $PATH isn’t used at all when Python searches for modules you have installed.
$PATHis used to find executables, so if you install MacPorts’ Python, make sure /opt/local/bin is in your $PATH.
There’s nothing wrong with using a MacPorts Python installation. If you are installing python modules from MacPorts but then not seeing them, that likely means you are not invoking the MacPorts python you installed to. In a terminal shell, you can use absolute paths to invoke the various Pythons that may be installed. For example:
To get the right python by default requires ensuring your shell $PATH is set properly to ensure that the right executable is found first. Another solution is to define shell aliases to the various pythons.
A python.org (MacPython) installation is fine, too, as others have suggested. easy_install can help but, again, because each Python instance may have its own easy_install command, make sure you are invoking the right easy_install.
If you use Python from MacPorts, it has it’s own easy_install located at: /opt/local/bin/easy_install-2.6 (for py26, that is). It’s not the same one as simply calling easy_install directly, even if you used python_select to change your default python command.
Have you looked into easy_install at all? It won’t synchronize your macports or anything like that, but it will automatically download the latest package and all necessary dependencies, i.e.
easy_install nose
for the nose unit testing package, or
easy_install trac
for the trac bug tracker.
There’s a bit more information on their EasyInstall page too.
When you install modules with MacPorts, it does not go into Apple’s version of Python. Instead those modules are installed onto the MacPorts version of Python selected.
You can change which version of Python is used by default using a mac port called python_select. instructions here.
Also, there’s easy_install. Which will use python to install python modules.
The __debug__ variable is handy in part because it affects every module. If I want to create another variable that works the same way, how would I do it?
The variable (let’s be original and call it ‘foo’) doesn’t have to be truly global, in the sense that if I change foo in one module, it is updated in others. I’d be fine if I could set foo before importing other modules and then they would see the same value for it.
I don’t endorse this solution in any way, shape or form. But if you add a variable to the __builtin__ module, it will be accessible as if a global from any other module that includes __builtin__ — which is all of them, by default.
a.py contains
print foo
b.py contains
import __builtin__
__builtin__.foo = 1
import a
The result is that “1” is printed.
Edit: The __builtin__ module is available as the local symbol __builtins__ — that’s the reason for the discrepancy between two of these answers. Also note that __builtin__ has been renamed to builtins in python3.
Define a module ( call it “globalbaz” ) and have the variables defined inside it. All the modules using this “pseudoglobal” should import the “globalbaz” module, and refer to it using “globalbaz.var_name”
This works regardless of the place of the change, you can change the variable before or after the import. The imported module will use the latest value. (I tested this in a toy example)
For clarification, globalbaz.py looks just like this:
当只有一个这样的模块时,我将其命名为“ g”。在其中,我为每个要视为全局变量的变量分配默认值。在使用它们的每个模块中,我都不使用“ from g import var”,因为这只会导致局部变量,该局部变量仅在导入时才从g初始化。我以g.var和“ g”的形式进行大多数引用。不断提醒我,我正在处理其他模块可能访问的变量。
I believe that there are plenty of circumstances in which it does make sense and it simplifies programming to have some globals that are known across several (tightly coupled) modules. In this spirit, I would like to elaborate a bit on the idea of having a module of globals which is imported by those modules which need to reference them.
When there is only one such module, I name it “g”. In it, I assign default values for every variable I intend to treat as global. In each module that uses any of them, I do not use “from g import var”, as this only results in a local variable which is initialized from g only at the time of the import. I make most references in the form g.var, and the “g.” serves as a constant reminder that I am dealing with a variable that is potentially accessible to other modules.
If the value of such a global variable is to be used frequently in some function in a module, then that function can make a local copy: var = g.var. However, it is important to realize that assignments to var are local, and global g.var cannot be updated without referencing g.var explicitly in an assignment.
Note that you can also have multiple such globals modules shared by different subsets of your modules to keep things a little more tightly controlled. The reason I use short names for my globals modules is to avoid cluttering up the code too much with occurrences of them. With only a little experience, they become mnemonic enough with only 1 or 2 characters.
It is still possible to make an assignment to, say, g.x when x was not already defined in g, and a different module can then access g.x. However, even though the interpreter permits it, this approach is not so transparent, and I do avoid it. There is still the possibility of accidentally creating a new variable in g as a result of a typo in the variable name for an assignment. Sometimes an examination of dir(g) is useful to discover any surprise names that may have arisen by such accident.
You can already do this with module-level variables. Modules are the same no matter what module they’re being imported from. So you can make the variable a module-level variable in whatever module it makes sense to put it in, and access it or assign to it from other modules. It would be better to call a function to set the variable’s value, or to make it a property of some singleton object. That way if you end up needing to run some code when the variable’s changed, you can do so without breaking your module’s external interface.
It’s not usually a great way to do things — using globals seldom is — but I think this is the cleanest way to do it.
回答 7
我想发布一个答案,在某些情况下找不到该变量。
循环导入可能会破坏模块的行为。
例如:
第一.py
import second
var =1
第二个
import first
print(first.var)# will throw an error because the order of execution happens before var gets declared.
This sounds like modifying the __builtin__ name space. To do it:
import __builtin__
__builtin__.foo = 'some-value'
Do not use the __builtins__ directly (notice the extra “s”) – apparently this can be a dictionary or a module. Thanks to ΤΖΩΤΖΙΟΥ for pointing this out, more can be found here.
Now foo is available for use everywhere.
I don’t recommend doing this generally, but the use of this is up to the programmer.
Assigning to it must be done as above, just setting foo = 'some-other-value' will only set it in the current namespace.
I use this for a couple built-in primitive functions that I felt were really missing. One example is a find function that has the same usage semantics as filter, map, reduce.
def builtin_find(f, x, d=None):
for i in x:
if f(i):
return i
return d
import __builtin__
__builtin__.find = builtin_find
Once this is run (for instance, by importing near your entry point) all your modules can use find() as though, obviously, it was built in.
find(lambda i: i < 0, [1, 3, 0, -5, -10]) # Yields -5, the first negative.
Note: You can do this, of course, with filter and another line to test for zero length, or with reduce in one sort of weird line, but I always felt it was weird.
回答 10
我可以使用字典来实现跨模块的可修改(或可变)变量:
# in myapp.__init__Timeouts={}# cross-modules global mutable variables for testing purposeTimeouts['WAIT_APP_UP_IN_SECONDS']=60# in myapp.mod1from myapp importTimeoutsdef wait_app_up(project_name, port):# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']# ...# in myapp.test.test_mod1from myapp importTimeoutsdef test_wait_app_up_fail(self):
timeout_bak =Timeouts['WAIT_APP_UP_IN_SECONDS']Timeouts['WAIT_APP_UP_IN_SECONDS']=3with self.assertRaises(hlp.TimeoutException)as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS']= timeout_bak
I could achieve cross-module modifiable (or mutable) variables by using a dictionary:
# in myapp.__init__
Timeouts = {} # cross-modules global mutable variables for testing purpose
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 60
# in myapp.mod1
from myapp import Timeouts
def wait_app_up(project_name, port):
# wait for app until Timeouts['WAIT_APP_UP_IN_SECONDS']
# ...
# in myapp.test.test_mod1
from myapp import Timeouts
def test_wait_app_up_fail(self):
timeout_bak = Timeouts['WAIT_APP_UP_IN_SECONDS']
Timeouts['WAIT_APP_UP_IN_SECONDS'] = 3
with self.assertRaises(hlp.TimeoutException) as cm:
wait_app_up(PROJECT_NAME, PROJECT_PORT)
self.assertEqual("Timeout while waiting for App to start", str(cm.exception))
Timeouts['WAIT_JENKINS_UP_TIMEOUT_IN_SECONDS'] = timeout_bak
When launching test_wait_app_up_fail, the actual timeout duration is 3 seconds.
I wondered if it would be possible to avoid some of the disadvantages of using global variables (see e.g. http://wiki.c2.com/?GlobalVariablesAreBad) by using a class namespace rather than a global/module namespace to pass values of variables. The following code indicates that the two methods are essentially identical. There is a slight advantage in using class namespaces as explained below.
The following code fragments also show that attributes or variables may be dynamically created and deleted in both global/module namespaces and class namespaces.
wall.py
# Note no definition of global variables
class router:
""" Empty class """
I call this module ‘wall’ since it is used to bounce variables off of. It will act as a space to temporarily define global variables and class-wide attributes of the empty class ‘router’.
This module imports wall and defines a single function sourcefn which defines a message and emits it by two different mechanisms, one via globals and one via the router function. Note that the variables wall.msg and wall.router.message are defined here for the first time in their respective namespaces.
dest.py
import wall
def destfn():
if hasattr(wall, 'msg'):
print 'global: ' + wall.msg
del wall.msg
else:
print 'global: ' + 'no message'
if hasattr(wall.router, 'msg'):
print 'router: ' + wall.router.msg
del wall.router.msg
else:
print 'router: ' + 'no message'
This module defines a function destfn which uses the two different mechanisms to receive the messages emitted by source. It allows for the possibility that the variable ‘msg’ may not exist. destfn also deletes the variables once they have been displayed.
main.py
import source, dest
source.sourcefn()
dest.destfn() # variables deleted after this call
dest.destfn()
This module calls the previously defined functions in sequence. After the first call to dest.destfn the variables wall.msg and wall.router.msg no longer exist.
The output from the program is:
global: Hello world!
router: Hello world!
global: no message
router: no message
The above code fragments show that the module/global and the class/class variable mechanisms are essentially identical.
If a lot of variables are to be shared, namespace pollution can be managed either by using several wall-type modules, e.g. wall1, wall2 etc. or by defining several router-type classes in a single file. The latter is slightly tidier, so perhaps represents a marginal advantage for use of the class-variable mechanism.
Many third-party Python modules have an attribute which holds the version information for the module (usually something like module.VERSION or module.__version__), however some do not.
Particular examples of such modules are libxslt and libxml2.
I need to check that the correct version of these modules are being used at runtime. Is there a way to do this?
A potential solution wold be to read in the source at runtime, hash it, and then compare it to the hash of the known version, but that’s nasty.
I’d stay away from hashing. The version of libxslt being used might contain some type of patch that doesn’t effect your use of it.
As an alternative, I’d like to suggest that you don’t check at run time (don’t know if that’s a hard requirement or not). For the python stuff I write that has external dependencies (3rd party libraries), I write a script that users can run to check their python install to see if the appropriate versions of modules are installed.
For the modules that don’t have a defined ‘version’ attribute, you can inspect the interfaces it contains (classes and methods) and see if they match the interface they expect. Then in the actual code that you’re working on, assume that the 3rd party modules have the interface you expect.
If you’re on python >=3.8 you can use a module from the built-in library for that. To check a package’s version (in this example lxml) run:
>>> from importlib.metadata import version
>>> version('lxml')
'4.3.1'
This functionality has been ported to older versions of python (<3.8) as well, but you need to install a separate library first:
pip install importlib_metadata
and then to check a package’s version (in this example lxml) run:
>>> from importlib_metadata import version
>>> version('lxml')
'4.3.1'
Keep in mind that this works only for packages installed from PyPI. Also, you must pass a package name as an argument to the version method, rather than a module name that this package provides (although they’re usually the same).
I found it quite unreliable to use the various tools available (including the best one pkg_resources mentioned by this other answer), as most of them do not cover all cases. For example
built-in modules
modules not installed but just added to the python path (by your IDE for example)
two versions of the same module available (one in python path superseding the one installed)
Since we needed a reliable way to get the version of any package, module or submodule, I ended up writing getversion. It is quite simple to use:
from getversion import get_module_version
import foo
version, details = get_module_version(foo)
In my case, Python was unable to find it because I’d put the code inside a module with hyphens, e.g. my-module. When I changed it to my_module it worked.
The following doesn’t solve the OP’s problem, but the title and error is exactly what I faced.
If your project has a setup.py script in it, you can install that package you are in, with python3 -m pip install -e . or python3 setup.py install or python3 setup.py develop, and this package will be installed, but still editable (so changes to the code will be seen when importing the package). If it doesn’t have a setup.py, make sense of it.
Anyway, the problem OP faces seems to not exist anymore?
file one.py:
def function():
print("output")
file two.py:
#!/usr/bin/env python3
import one
one.function()
chmod +x two.py # To allow execution of the python file
./two.py # Only works if you have a python shebang
Command line output: output
Other solutions seem ‘dirty’
In the case of OP with 2 test files, modifying them to work is probably fine. However, in other real scenarios, the methods listed in the other answers is probably not recommended. They require you to modify the python code or restrict your flexibility (running the python file from a specific directory) and generally introduce annoyances. What if you’ve just cloned a project, and this happens? It probably already works for other people, and making code changes is unnecessary. The chosen answer also wants people to run a script from a specific folder to make it work. This can be a source of long term annoyance, which is never good. It also suggests adding your specific python folder to PATH (can be done through python or command line). Again, what happens if you rename or move the folder in a few months? You have to hunt down this page again, and eventually discover you need to set the path (and that you did exactly this a few months ago), and that you simply need to update a path (sure you could use sys.path and programmatically set it, but this can be flaky still). Many sources of great annoyance.
The python interpreter has -mmodule option that “Runs library module module as a script”.
With this python code a.py:
if __name__ == "__main__":
print __package__
print __name__
I tested python -m a to get
"" <-- Empty String
__main__
whereas python a.py returns
None <-- None
__main__
To me, those two invocation seems to be the same except __package__ is not None when invoked with -m option.
Interestingly, with python -m runpy a, I get the same as python -m a with python module compiled to get a.pyc.
What’s the (practical) difference between these invocations? Any pros and cons between them?
Also, David Beazley’s Python Essential Reference explains it as “The -m option runs a library module as a script which executes inside the __main__ module prior to the execution of the main script“. What does it mean?
When you use the -m command-line flag, Python will import a module or package for you, then run it as a script. When you don’t use the -m flag, the file you named is run as just a script.
The distinction is important when you try to run a package. There is a big difference between:
python foo/bar/baz.py
and
python -m foo.bar.baz
as in the latter case, foo.bar is imported and relative imports will work correctly with foo.bar as the starting point.
As a result, Python has to actually care about packages when using the -m switch. A normal script can never be a package, so __package__ is set to None.
But run a package or module inside a package with -m and now there is at least the possibility of a package, so the __package__ variable is set to a string value; in the above demonstration it is set to 'foo.bar', for plain modules not inside a package it is set to an empty string.
As for the __main__module, Python imports scripts being run as it would import regular modules. A new module object is created to hold the global namespace and is stored in sys.modules['__main__']. This is what the __name__ variable refers to, it is a key in that structure.
For packages, you can create a __main__.py module inside and have that run when running python -m package_name; in fact that is the only way you can run a package as a script:
$ PYTHONPATH=test python -m foo.bar
python: No module named foo.bar.__main__; 'foo.bar' is a package and cannot be directly executed
$ cp test/foo/bar/baz.py test/foo/bar/__main__.py
$ PYTHONPATH=test python -m foo.bar
foo.bar
__main__
So, when naming a package for running with -m, Python looks for a __main__ module contained in that package and executes that as a script. Its name is then still set to '__main__' and the module object is still stored in sys.modules['__main__'].
$ python -m timeit '"-".join(str(n) for n in range(100))'10000 loops, best of 3:40.3 usec per loop
$ python -m timeit '"-".join([str(n) for n in range(100)])'10000 loops, best of 3:33.4 usec per loop
$ python -m timeit '"-".join(map(str, range(100)))'10000 loops, best of 3:25.2 usec per loop
The results are pretty much the same when you have a script, but when you develop a package, without the -m flag, there’s no way to get the imports to work correctly if you want to run a subpackage or module in the package as the main entry point to your program (and believe me, I’ve tried.)
Search sys.path for the named module and execute its contents as the __main__ module.
and
As with the -c option, the current directory will be added to the start of sys.path.
so
python -m pdb
is roughly equivalent to
python /usr/lib/python3.5/pdb.py
(assuming you don’t have a package or script in your current directory called pdb.py)
Explanation:
Behavior is made “deliberately similar to” scripts.
Many standard library modules contain code that is invoked on their execution as a script. An example is the timeit module:
Some python code is intended to be run as a module: (I think this example is better than the commandline option doc example)
$ python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 3: 40.3 usec per loop
$ python -m timeit '"-".join([str(n) for n in range(100)])'
10000 loops, best of 3: 33.4 usec per loop
$ python -m timeit '"-".join(map(str, range(100)))'
10000 loops, best of 3: 25.2 usec per loop
The -m command line option – python -m modulename will find a module
in the standard library, and invoke it. For example, python -m pdb
is equivalent to python /usr/lib/python2.4/pdb.py
Follow-up Question
Also, David Beazley’s Python Essential Reference explains it as “The
-m option runs a library module as a script which executes inside the __main__ module prior to the execution of the main script”.
It means any module you can lookup with an import statement can be run as the entry point of the program – if it has a code block, usually near the end, with if __name__ == '__main__':.
-m without adding the current directory to the path:
A comment here elsewhere says:
That the -m option also adds the current directory to sys.path, is obviously a security issue (see: preload attack). This behavior is similar to library search order in Windows (before it had been hardened recently). It’s a pity that Python does not follow the trend and does not offer a simple way to disable adding . to sys.path
Well, this demonstrates the possible issue – (in windows remove the quotes):
Run Python in isolated mode. This also implies -E and -s. In isolated mode sys.path contains neither the script’s directory nor the user’s site-packages directory. All PYTHON* environment variables are ignored, too. Further restrictions may be imposed to prevent the user from injecting malicious code.
The main reason to run a module (or package) as a script with -m is to simplify deployment, especially on Windows. You can install scripts in the same place in the Python library where modules normally go – instead of polluting PATH or global executable directories such as ~/.local (the per-user scripts directory is ridiculously hard to find in Windows).
Then you just type -m and Python finds the script automagically. For example, python -m pip will find the correct pip for the same instance of Python interpreter which executes it. Without -m, if user has several Python versions installed, which one would be the “global” pip?
If user prefers “classic” entry points for command-line scripts, these can be easily added as small scripts somewhere in PATH, or pip can create these at install time with entry_points parameter in setup.py.
So just check for __name__ == '__main__' and ignore other non-reliable implementation details.