a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))
这是最快的变体。(实际上也只适用于一行。)
您在第二个轴上进行汇总的变体axis=1都慢得多。
复制剧情的代码:
import numpy
import perfplot
from scipy.spatial import distance
def linalg_norm(data):
a, b = data[0]return numpy.linalg.norm(a - b, axis=1)def linalg_norm_T(data):
a, b = data[1]return numpy.linalg.norm(a - b, axis=0)def sqrt_sum(data):
a, b = data[0]return numpy.sqrt(numpy.sum((a - b)**2, axis=1))def sqrt_sum_T(data):
a, b = data[1]return numpy.sqrt(numpy.sum((a - b)**2, axis=0))def scipy_distance(data):
a, b = data[0]return list(map(distance.euclidean, a, b))def sqrt_einsum(data):
a, b = data[0]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))def sqrt_einsum_T(data):
a, b = data[1]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))def setup(n):
a = numpy.random.rand(n,3)
b = numpy.random.rand(n,3)
out0 = numpy.array([a, b])
out1 = numpy.array([a.T, b.T])return out0, out1
perfplot.save("norm.png",
setup=setup,
n_range=[2** k for k in range(22)],
kernels=[
linalg_norm,
linalg_norm_T,
scipy_distance,
sqrt_sum,
sqrt_sum_T,
sqrt_einsum,
sqrt_einsum_T,],
logx=True,
logy=True,
xlabel="len(x), len(y)",)
For anyone interested in computing multiple distances at once, I’ve done a little comparison using perfplot (a small project of mine).
The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn’t matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or
a_min_b = a - b
numpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))
which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)
The variants where you sum up over the second axis, axis=1, are all substantially slower.
Code to reproduce the plot:
import numpy
import perfplot
from scipy.spatial import distance
def linalg_norm(data):
a, b = data[0]
return numpy.linalg.norm(a - b, axis=1)
def linalg_norm_T(data):
a, b = data[1]
return numpy.linalg.norm(a - b, axis=0)
def sqrt_sum(data):
a, b = data[0]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))
def sqrt_sum_T(data):
a, b = data[1]
return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))
def scipy_distance(data):
a, b = data[0]
return list(map(distance.euclidean, a, b))
def sqrt_einsum(data):
a, b = data[0]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))
def sqrt_einsum_T(data):
a, b = data[1]
a_min_b = a - b
return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))
def setup(n):
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
out0 = numpy.array([a, b])
out1 = numpy.array([a.T, b.T])
return out0, out1
perfplot.save(
"norm.png",
setup=setup,
n_range=[2 ** k for k in range(22)],
kernels=[
linalg_norm,
linalg_norm_T,
scipy_distance,
sqrt_sum,
sqrt_sum_T,
sqrt_einsum,
sqrt_einsum_T,
],
logx=True,
logy=True,
xlabel="len(x), len(y)",
)
回答 3
我想用各种性能说明来解释简单答案。np.linalg.norm可能会做比您需要的更多的工作:
dist = numpy.linalg.norm(a-b)
首先-该功能的目的是工作在一个列表,并返回所有的值,例如到距离比较pA的点的集合sP:
sP = set(points)
pA = point
distances = np.linalg.norm(sP - pA, ord=2, axis=1.)# 'distances' is a list
dist = root ( x^2+ y^2+ z^2):.
dist^2= x^2+ y^2+ z^2and
sq(N)< sq(M) iff M > N
and
sq(N)> sq(M) iff N > M
and
sq(N)= sq(M) iff N == M
简而言之:直到我们实际需要以X而不是X ^ 2为单位的距离,我们才能消除计算中最困难的部分。
# Still naive, but much faster.def distance_sq(left, right):""" Returns the square of the distance between left and right. """return(((left.x - right.x)**2)+((left.y - right.y)**2)+((left.z - right.z)**2))def sort_things_by_distance(origin, things):return things.sort(key=lambda thing: distance_sq(origin, thing))def in_range(origin, range, things):
things_in_range =[]# Remember that sqrt(N)**2 == N, so if we square# range, we don't need to root the distances.
range_sq = range**2for thing in things:if distance_sq(origin, thing)<= range_sq:
things_in_range.append(thing)
Firstly – every time we call it, we have to do a global lookup for “np”, a scoped lookup for “linalg” and a scoped lookup for “norm”, and the overhead of merely calling the function can equate to dozens of python instructions.
Lastly, we wasted two operations on to store the result and reload it for return…
First pass at improvement: make the lookup faster, skip the store
The function call overhead still amounts to some work, though. And you’ll want to do benchmarks to determine whether you might be better doing the math yourself:
On some platforms, **0.5 is faster than math.sqrt. Your mileage may vary.
**** Advanced performance notes.
Why are you calculating distance? If the sole purpose is to display it,
print("The target is %.2fm away" % (distance(a, b)))
move along. But if you’re comparing distances, doing range checks, etc., I’d like to add some useful performance observations.
Let’s take two cases: sorting by distance or culling a list to items that meet a range constraint.
# Ultra naive implementations. Hold onto your hat.
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance(origin, thing))
def in_range(origin, range, things):
things_in_range = []
for thing in things:
if distance(origin, thing) <= range:
things_in_range.append(thing)
The first thing we need to remember is that we are using Pythagoras to calculate the distance (dist = sqrt(x^2 + y^2 + z^2)) so we’re making a lot of sqrt calls. Math 101:
dist = root ( x^2 + y^2 + z^2 )
:.
dist^2 = x^2 + y^2 + z^2
and
sq(N) < sq(M) iff M > N
and
sq(N) > sq(M) iff N > M
and
sq(N) = sq(M) iff N == M
In short: until we actually require the distance in a unit of X rather than X^2, we can eliminate the hardest part of the calculations.
# Still naive, but much faster.
def distance_sq(left, right):
""" Returns the square of the distance between left and right. """
return (
((left.x - right.x) ** 2) +
((left.y - right.y) ** 2) +
((left.z - right.z) ** 2)
)
def sort_things_by_distance(origin, things):
return things.sort(key=lambda thing: distance_sq(origin, thing))
def in_range(origin, range, things):
things_in_range = []
# Remember that sqrt(N)**2 == N, so if we square
# range, we don't need to root the distances.
range_sq = range**2
for thing in things:
if distance_sq(origin, thing) <= range_sq:
things_in_range.append(thing)
Great, both functions no-longer do any expensive square roots. That’ll be much faster. We can also improve in_range by converting it to a generator:
def in_range(origin, range, things):
range_sq = range**2
yield from (thing for thing in things
if distance_sq(origin, thing) <= range_sq)
This especially has benefits if you are doing something like:
if any(in_range(origin, max_dist, things)):
...
But if the very next thing you are going to do requires a distance,
for nearby in in_range(origin, walking_distance, hotdog_stands):
print("%s %.2fm" % (nearby.name, distance(origin, nearby)))
consider yielding tuples:
def in_range_with_dist_sq(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = distance_sq(origin, thing)
if dist_sq <= range_sq: yield (thing, dist_sq)
This can be especially useful if you might chain range checks (‘find things that are near X and within Nm of Y’, since you don’t have to calculate the distance again).
But what about if we’re searching a really large list of things and we anticipate a lot of them not being worth consideration?
There is actually a very simple optimization:
def in_range_all_the_things(origin, range, things):
range_sq = range**2
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
Whether this is useful will depend on the size of ‘things’.
def in_range_all_the_things(origin, range, things):
range_sq = range**2
if len(things) >= 4096:
for thing in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
elif len(things) > 32:
for things in things:
dist_sq = (origin.x - thing.x) ** 2
if dist_sq <= range_sq:
dist_sq += (origin.y - thing.y) ** 2 + (origin.z - thing.z) ** 2
if dist_sq <= range_sq:
yield thing
else:
... just calculate distance and range-check it ...
And again, consider yielding the dist_sq. Our hotdog example then becomes:
# Chaining generators
info = in_range_with_dist_sq(origin, walking_distance, hotdog_stands)
info = (stand, dist_sq**0.5 for stand, dist_sq in info)
for stand, dist in info:
print("%s %.2fm" % (stand, dist))
Starting Python 3.8, the math module directly provides the dist function, which returns the euclidean distance between two points (given as tuples or lists of coordinates):
from math import dist
dist((1, 2, 6), (-2, 3, 2)) # 5.0990195135927845
And if you’re working with lists:
dist([1, 2, 6], [-2, 3, 2]) # 5.0990195135927845
回答 6
可以像下面这样完成。我不知道它有多快,但是它没有使用NumPy。
from math import sqrt
a =(1,2,3)# Data point 1
b =(4,5,6)# Data point 2print sqrt(sum((a - b)**2for a, b in zip(a, b)))
However, if speed is a concern I would recommend experimenting on your machine. I’ve found that using math library’s sqrt with the ** operator for the square is much faster on my machine than the one-liner NumPy solution.
I ran my tests using this simple program:
#!/usr/bin/python
import math
import numpy
from random import uniform
def fastest_calc_dist(p1,p2):
return math.sqrt((p2[0] - p1[0]) ** 2 +
(p2[1] - p1[1]) ** 2 +
(p2[2] - p1[2]) ** 2)
def math_calc_dist(p1,p2):
return math.sqrt(math.pow((p2[0] - p1[0]), 2) +
math.pow((p2[1] - p1[1]), 2) +
math.pow((p2[2] - p1[2]), 2))
def numpy_calc_dist(p1,p2):
return numpy.linalg.norm(numpy.array(p1)-numpy.array(p2))
TOTAL_LOCATIONS = 1000
p1 = dict()
p2 = dict()
for i in range(0, TOTAL_LOCATIONS):
p1[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
p2[i] = (uniform(0,1000),uniform(0,1000),uniform(0,1000))
total_dist = 0
for i in range(0, TOTAL_LOCATIONS):
for j in range(0, TOTAL_LOCATIONS):
dist = fastest_calc_dist(p1[i], p2[j]) #change this line for testing
total_dist += dist
print total_dist
On my machine, math_calc_dist runs much faster than numpy_calc_dist: 1.5 seconds versus 23.5 seconds.
To get a measurable difference between fastest_calc_dist and math_calc_dist I had to up TOTAL_LOCATIONS to 6000. Then fastest_calc_dist takes ~50 seconds while math_calc_dist takes ~60 seconds.
You can also experiment with numpy.sqrt and numpy.square though both were slower than the math alternatives on my machine.
My tests were run with Python 2.6.6.
回答 10
您可以先减去向量,然后减去内积。
按照您的示例,
a = numpy.array((xa, ya, za))
b = numpy.array((xb, yb, zb))
tmp = a - b
sum_squared = numpy.dot(tmp.T, tmp)
result = sqrt(sum_squared)
Return the Euclidean distance between two points p and q, each given
as a sequence (or iterable) of coordinates. The two points must have
the same dimension.
Roughly equivalent to:
sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q)))
import numpy as np
from scipy.spatial import distance
input_arr = np.array([[0,3,0],[2,0,0],[0,1,3],[0,1,2],[-1,0,1],[1,1,1]])
test_case = np.array([0,0,0])
dst=[]
for i in range(0,6):
temp = distance.euclidean(test_case,input_arr[i])
dst.append(temp)
print(dst)
回答 17
import math
dist = math.hypot(math.hypot(xa-xb, ya-yb), za-zb)
which does actually nothing more than using Pythagoras’ theorem to calculate the distance, by adding the squares of Δx, Δy and Δz and rooting the result.
Find difference of two matrices first. Then, apply element wise multiplication with numpy’s multiply command. After then, find summation of the element wise multiplied new matrix. Finally, find square root of the summation.
def findEuclideanDistance(a, b):
euclidean_distance = a - b
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance
回答 20
import numpy as np
# any two python array as two points
a =[0,0]
b =[3,4]
import numpy as np
# any two python array as two points
a = [0, 0]
b = [3, 4]
You first change list to numpy array and do like this: print(np.linalg.norm(np.array(a) - np.array(b))). Second method directly from python list as: print(np.linalg.norm(np.subtract(a,b)))
define intToBinString, receiving intVal:if intVal is equal to zero:return"0"
set strVal to ""while intVal is greater than zero:if intVal is odd:
prefix "1" to strVal
else:
prefix "0" to strVal
divide intVal by two, rounding down
return strVal
Python actually does have something already built in for this, the ability to do operations such as '{0:b}'.format(42), which will give you the bit pattern (in a string) for 42, or 101010.
For a more general philosophy, no language or library will give its user base everything that they desire. If you’re working in an environment that doesn’t provide exactly what you need, you should be collecting snippets of code as you develop to ensure you never have to write the same thing twice. Such as, for example, the pseudo-code:
define intToBinString, receiving intVal:
if intVal is equal to zero:
return "0"
set strVal to ""
while intVal is greater than zero:
if intVal is odd:
prefix "1" to strVal
else:
prefix "0" to strVal
divide intVal by two, rounding down
return strVal
which will construct your binary string based on the decimal value. Just keep in mind that’s a generic bit of pseudo-code which may not be the most efficient way of doing it though, with the iterations you seem to be proposing, it won’t make much difference. It’s really just meant as a guideline on how it could be done.
The general idea is to use code from (in order of preference):
the language or built-in libraries.
third-party libraries with suitable licenses.
your own collection.
something new you need to write (and save in your own collection for later).
def get_bin(x, n=0):"""
Get the binary representation of x.
Parameters
----------
x : int
n : int
Minimum number of digits. If x needs less digits in binary, the rest
is filled with zeros.
Returns
-------
str
"""return format(x,'b').zfill(n)
def get_bin(x, n=0):
"""
Get the binary representation of x.
Parameters
----------
x : int
n : int
Minimum number of digits. If x needs less digits in binary, the rest
is filled with zeros.
Returns
-------
str
"""
return format(x, 'b').zfill(n)
回答 4
作为参考:
def toBinary(n):return''.join(str(1& int(n)>> i)for i in range(64)[::-1])
def toBinary(n):
return ''.join(str(1 & int(n) >> i) for i in range(64)[::-1])
This function can convert a positive integer as large as 18446744073709551615, represented as string '1111111111111111111111111111111111111111111111111111111111111111'.
It can be modified to serve a much larger integer, though it may not be as handy as "{0:b}".format() or bin().
t1 = time()
for i in range(1000000):
binary(i)
t2 = time()
print(t2 - t1)
# 6.57236599922
in compare to
t1 = time()
for i in range(1000000):
'{0:b}'.format(i)
t2 = time()
print(t2 - t1)
# 0.68017411232
回答 8
替代方案摘要:
n=42assert"-101010"== format(-n,'b')assert"-101010"=="{0:b}".format(-n)assert"-101010"==(lambda x: x >=0and str(bin(x))[2:]or"-"+ str(bin(x))[3:])(-n)assert"0b101010"== bin(n)assert"101010"== bin(n)[2:]# But this won't work for negative numbers.
Examples-------->>> a = np.array([[2],[7],[23]], dtype=np.uint8)>>> a
array([[2],[7],[23]], dtype=uint8)>>> b = np.unpackbits(a, axis=1)>>> b
array([[0,0,0,0,0,0,1,0],[0,0,0,0,0,1,1,1],[0,0,0,1,0,1,1,1]], dtype=uint8)
f = str(bin(10))
c = []
c.append("".join(map(int, f[2:])))
print c
回答 17
这是我刚刚实现的代码。这不是一种方法,但是您可以将其用作现成的功能!
def inttobinary(number):if number ==0:return str(0)
result =""while(number !=0):
remainder = number%2
number = number/2
result += str(remainder)return result[::-1]# to invert the string
Here is the code I’ve just implemented. This is not a method but you can use it as a ready-to-use function!
def inttobinary(number):
if number == 0:
return str(0)
result =""
while (number != 0):
remainder = number%2
number = number/2
result += str(remainder)
return result[::-1] # to invert the string
回答 18
这是使用divmod()功能的简单解决方案,该功能返回提醒和不带分数的除法结果。
def dectobin(number):
bin =''while(number >=1):
number, rem = divmod(number,2)
bin = bin + str(rem)return bin
def to_bin(dec):
flag = True
bin_str = ''
while flag:
remainder = dec % 2
quotient = dec / 2
if quotient == 0:
flag = False
bin_str += str(remainder)
dec = quotient
bin_str = bin_str[::-1] # reverse the string
return bin_str
回答 22
这是使用常规数学的另一种方式,没有循环,只有递归。(特殊情况0不返回任何内容)。
def toBin(num):if num ==0:return""return toBin(num//2)+ str(num%2)print([(toBin(i))for i in range(10)])['','1','10','11','100','101','110','111','1000','1001']
Here’s yet another way using regular math, no loops, only recursion. (Trivial case 0 returns nothing).
def toBin(num):
if num == 0:
return ""
return toBin(num//2) + str(num%2)
print ([(toBin(i)) for i in range(10)])
['', '1', '10', '11', '100', '101', '110', '111', '1000', '1001']
回答 23
计算器,具有DEC,BIN,HEX的所有必要功能:(使用Python 3.5进行制造和测试)
您可以更改输入的测试编号并获得转换后的编号。
# CONVERTER: DEC / BIN / HEXdef dec2bin(d):# dec -> bin
b = bin(d)return b
def dec2hex(d):# dec -> hex
h = hex(d)return h
def bin2dec(b):# bin -> dec
bin_numb="{0:b}".format(b)
d = eval(bin_numb)return d,bin_numb
def bin2hex(b):# bin -> hex
h = hex(b)return h
def hex2dec(h):# hex -> dec
d = int(h)return d
def hex2bin(h):# hex -> bin
b = bin(h)return b
## TESTING NUMBERS
numb_dec =99
numb_bin =0b0111
numb_hex =0xFF## CALCULATIONS
res_dec2bin = dec2bin(numb_dec)
res_dec2hex = dec2hex(numb_dec)
res_bin2dec,bin_numb = bin2dec(numb_bin)
res_bin2hex = bin2hex(numb_bin)
res_hex2dec = hex2dec(numb_hex)
res_hex2bin = hex2bin(numb_hex)## PRINTINGprint('------- DECIMAL to BIN / HEX -------\n')print('decimal:',numb_dec,'\nbin: ',res_dec2bin,'\nhex: ',res_dec2hex,'\n')print('------- BINARY to DEC / HEX -------\n')print('binary: ',bin_numb,'\ndec: ',numb_bin,'\nhex: ',res_bin2hex,'\n')print('----- HEXADECIMAL to BIN / HEX -----\n')print('hexadec:',hex(numb_hex),'\nbin: ',res_hex2bin,'\ndec: ',res_hex2dec,'\n')
>>>print>> sys.stderr
Traceback(most recent call last):File"<stdin>", line 1,in<module>TypeError: unsupported operand type(s)for>>:'builtin_function_or_method'and'_io.TextIOWrapper'.Did you mean "print(<message>, file=<output_stream>)"?
This error message means that you are attempting to use Python 3 to follow an example or run a program that uses the Python 2 print statement:
print "Hello, World!"
The statement above does not work in Python 3. In Python 3 you need to add parentheses around the value to be printed:
print("Hello, World!")
“SyntaxError: Missing parentheses in call to ‘print’” is a new error message that was added in Python 3.4.2 primarily to help users that are trying to follow a Python 2 tutorial while running Python 3.
In Python 3, printing values changed from being a distinct statement to being an ordinary function call, so it now needs parentheses:
>>> print("Hello, World!")
Hello, World!
In earlier versions of Python 3, the interpreter just reports a generic syntax error, without providing any useful hints as to what might be going wrong:
As for whyprint became an ordinary function in Python 3, that didn’t relate to the basic form of the statement, but rather to how you did more complicated things like printing multiple items to stderr with a trailing space rather than ending the line.
Starting with the Python 3.6.3 release in September 2017, some error messages related to the Python 2.x print syntax have been updated to recommend their Python 3.x counterparts:
>>> print "Hello!"
File "<stdin>", line 1
print "Hello!"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Hello!")?
Since the “Missing parentheses in call to print” case is a compile time syntax error and hence has access to the raw source code, it’s able to include the full text on the rest of the line in the suggested replacement. However, it doesn’t currently try to work out the appropriate quotes to place around that expression (that’s not impossible, just sufficiently complicated that it hasn’t been done).
The TypeError raised for the right shift operator has also been customised:
>>> print >> sys.stderr
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for >>: 'builtin_function_or_method' and '_io.TextIOWrapper'. Did you mean "print(<message>, file=<output_stream>)"?
Since this error is raised when the code runs, rather than when it is compiled, it doesn’t have access to the raw source code, and hence uses meta-variables (<message> and <output_stream>) in the suggested replacement expression instead of whatever the user actually typed. Unlike the syntax error case, it’s straightforward to place quotes around the Python expression in the custom right shift error message.
I could also just add that I knew everything about the syntax change between Python2.7 and Python3, and my code was correctly written as print("string") and even
print(f"string")…
But after some time of debugging I realized that my bash script was calling python like:
python file_name.py
which had the effect of calling my python script by default using python2.7 which gave the error. So I changed my bash script to:
python3 file_name.py
which of coarse uses python3 to run the script which fixed the error.
Outside of the direct answers here, one should note the other key difference between python 2 and 3. The official python wiki goes into almost all of the major differences and focuses on when you should use either of the versions. This blog post also does a fine job of explaining the current python universe and the somehow unsolved puzzle of moving to python 3.
As far as I can tell, you are beginning to learn the python language. You should consider the aforementioned articles before you continue down the python 3 route. Not only will you have to change some of your syntax, you will also need to think about which packages will be available to you (an advantage of python 2) and potential optimizations that could be made in your code (an advantage of python 3).
Relative imports use a module’s __name__ attribute to determine that module’s position in the package hierarchy. If the module’s name does not contain any package information (e.g. it is set to ‘__main__’) then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system.
In Python 2.6, they’re adding the ability to reference modules relative to the main module. PEP 366 describes the change.
Update: According to Nick Coghlan, the recommended alternative is to run the module inside the package using the -m switch.
回答 1
这是对我有效的解决方案:
我执行as的相对导入 from ..sub2 import mod2
,然后,如果要运行,mod1.py则转到的父目录,app并使用python -m开关as运行模块 python -m app.sub1.mod1。
相对导入发生此问题的真正原因是,相对导入通过获取__name__模块的属性起作用。如果模块直接运行,则__name__设置为__main__,并且不包含有关包结构的任何信息。并且,这就是为什么python抱怨该relative import in non-package错误的原因。
I do the relative imports as from ..sub2 import mod2
and then, if I want to run mod1.py then I go to the parent directory of app and run the module using the python -m switch as python -m app.sub1.mod1.
The real reason why this problem occurs with relative imports, is that relative imports works by taking the __name__ property of the module. If the module is being directly run, then __name__ is set to __main__ and it doesn’t contain any information about package structure. And, thats why python complains about the relative import in non-package error.
So, by using the -m switch you provide the package structure information to python, through which it can resolve the relative imports successfully.
I have encountered this problem many times while doing relative imports. And, after reading all the previous answers, I was still not able to figure out how to solve it, in a clean way, without needing to put boilerplate code in all files. (Though some of the comments were really helpful, thanks to @ncoghlan and @XiongChiamiov)
Hope this helps someone who is fighting with relative imports problem, because going through PEP is really not fun.
Alternatively 2 or 3 could use: from app.package_a import module_a
That will work as long as you have app in your PYTHONPATH. main.py could be anywhere then.
So you write a setup.py to copy (install) the whole app package and subpackages to the target system’s python folders, and main.py to target system’s script folders.
“Guido views running scripts within a package as an anti-pattern” (rejected
PEP-3122)
I have spent so much time trying to find a solution, reading related posts here on Stack Overflow and saying to myself “there must be a better way!”. Looks like there is not.
def import_path(fullpath):"""
Import a file with full path specification. Allows one to
import from anywhere, something __import__ does not do.
"""
path, filename = os.path.split(fullpath)
filename, ext = os.path.splitext(filename)
sys.path.append(path)
module = __import__(filename)
reload(module)# Might be out of datedel sys.path[-1]return module
def import_path(fullpath):
"""
Import a file with full path specification. Allows one to
import from anywhere, something __import__ does not do.
"""
path, filename = os.path.split(fullpath)
filename, ext = os.path.splitext(filename)
sys.path.append(path)
module = __import__(filename)
reload(module) # Might be out of date
del sys.path[-1]
return module
I’m using this snippet to import modules from paths, hope that helps
def print_a():print'This is a function in dir package_a'
app / package_b / fun_b.py
from app.package_a.fun_a import print_a
def print_b():print'This is a function in dir package_b'print'going to call a function in dir package_a'print'-'*30
print_a()
main.py
from app.package_b import fun_b
fun_b.print_b()
如果运行,$ python main.py它将返回:
Thisis a function in dir package_b
going to call a function in dir package_a
------------------------------Thisis a function in dir package_a
main.py可以: from app.package_b import fun_b
fun_b.py确实 from app.package_a.fun_a import print_a
def print_a():
print 'This is a function in dir package_a'
app/package_b/fun_b.py
from app.package_a.fun_a import print_a
def print_b():
print 'This is a function in dir package_b'
print 'going to call a function in dir package_a'
print '-'*30
print_a()
main.py
from app.package_b import fun_b
fun_b.print_b()
if you run $ python main.py it returns:
This is a function in dir package_b
going to call a function in dir package_a
------------------------------
This is a function in dir package_a
main.py does: from app.package_b import fun_b
fun_b.py does from app.package_a.fun_a import print_a
so file in folder package_b used file in folder package_a, which is what you want. Right??
Note that I have already installed mymodule, but in my installation I do not have “mymodule1”
and I would get an ImportError because it was trying to import from my installed modules.
I tried to do a sys.path.append, and that didn’t work. What did work was a sys.path.insert
if __name__ == '__main__':
sys.path.insert(0, '../..')
So kind of a hack, but got it all to work!
So keep in mind, if you want your decision to override other paths then you need to use sys.path.insert(0, pathname) to get it to work! This was a very frustrating sticking point for me, allot of people say to use the “append” function to sys.path, but that doesn’t work if you already have a module defined (I find it very strange behavior)
Let me just put this here for my own reference. I know that it is not good Python code, but I needed a script for a project I was working on and I wanted to put the script in a scripts directory.
In Python 2.5, you can switch import‘s behaviour to absolute imports using a from __future__ import absolute_import directive. This absolute- import behaviour will become the default in a future version (probably Python 2.7). Once absolute imports are the default, import string will always find the standard library’s version. It’s suggested that users should begin using absolute imports as much as possible, so it’s preferable to begin writing from pkg import string in your code
On top of what John B said, it seems like setting the __package__ variable should help, instead of changing __main__ which could screw up other things. But as far as I could test, it doesn’t completely work as it should.
I have the same problem and neither PEP 328 or 366 solve the problem completely, as both, by the end of the day, need the head of the package to be included in sys.path, as far as I could understand.
I should also mention that I did not find how to format the string that should go into those variables. Is it "package_head.subfolder.module_name" or what?
What is the difference between the search() and match() functions in the Python re module?
I’ve read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I’m hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I’ll have a better place to return with my question and it will take less time to re-learn it.
If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding MatchObject instance.
Return None if the string does not
match the pattern; note that this is
different from a zero-length match.
Note: If you want to locate a match
anywhere in string, use search()
instead.
Scan through string looking for a
location where the regular expression
pattern produces a match, and return a
corresponding MatchObject instance.
Return None if no position in the
string matches the pattern; note that
this is different from finding a
zero-length match at some point in the
string.
So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.
Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default).
Note that match may differ from search
even when using a regular expression
beginning with '^': '^' matches only
at the start of the string, or in
MULTILINE mode also immediately
following a newline. The “match”
operation succeeds only if the pattern
matches at the start of the string
regardless of mode, or at the starting
position given by the optional pos
argument regardless of whether a
newline precedes it.
Now, enough talk. Time to see some example code:
# example code:
string_with_newlines = """something
someotherthing"""
import re
print re.match('some', string_with_newlines) # matches
print re.match('someother',
string_with_newlines) # won't match
print re.match('^someother', string_with_newlines,
re.MULTILINE) # also won't match
print re.search('someother',
string_with_newlines) # finds something
print re.search('^someother', string_with_newlines,
re.MULTILINE) # also finds something
m = re.compile('thing$', re.MULTILINE)
print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines,
re.MULTILINE) # also matches
re.searchsearches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.
import random
import re
import string
import time
LENGTH =10
LIST_SIZE =1000000def generate_word():
word =[random.choice(string.ascii_lowercase)for _ in range(LENGTH)]
word =''.join(word)return word
wordlist =[generate_word()for _ in range(LIST_SIZE)]
start = time.time()[re.search('python', word)for word in wordlist]print('search:', time.time()- start)
start = time.time()[re.match('(.*?)python(.*?)', word)for word in wordlist]print('match:', time.time()- start)
match is much faster than search, so instead of doing regex.search(“word”) you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.
import random
import re
import string
import time
LENGTH = 10
LIST_SIZE = 1000000
def generate_word():
word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
word = ''.join(word)
return word
wordlist = [generate_word() for _ in range(LIST_SIZE)]
start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)
start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)
I made 10 measurements (1M, 2M, …, 10M words) which gave me the following plot:
The resulting lines are surprisingly (actually not that surprisingly) straight. And the search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.
回答 4
您可以参考以下示例以了解re.matchand.search 的工作原理
a ="123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)
The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)
More soberly, As John D. Cook remarks, re.match() “behaves as if every pattern has ^ prepended.” In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern’s left side. But it also doesn’t anchor a pattern’s right side: that still requires a terminating $.
Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.
re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.
回答 7
矮得多:
search 扫描整个字符串。
match 仅扫描字符串的开头。
以下Ex表示:
>>> a ="123abc">>> re.match("[a-z]+",a)None>>> re.search("[a-z]+",a)
abc
What would be your preferred way to concatenate strings from a sequence such that between every two consecutive pairs a comma is added. That is, how do you map, for instance, ['a', 'b', 'c'] to 'a,b,c'? (The cases ['s'] and [] should be mapped to 's' and '', respectively.)
I usually end up using something like ''.join(map(lambda x: x+',',l))[:-1], but also feeling somewhat unsatisfied.
Obviously it gets more complicated if you need to quote/escape commas etc in the values. In that case I would suggest looking at the csv module in the standard library:
Using generator expressions has the benefit of also producing an iterator but saves importing itertools. Furthermore, list comprehensions are generally preferred to map, thus, I’d expect generator expressions to be preferred to imap.
>>> l = [1, "foo", 4 ,"bar"]
>>> ",".join(str(bit) for bit in l)
'1,foo,4,bar'
回答 6
>>> my_list =['A','','','D','E',]>>>",".join([str(i)for i in my_list if i])'A,D,E'
@jmanning2k using a list comprehension has the downside of creating a new temporary list. The better solution would be using itertools.imap which returns an iterator
from itertools import imap
l = [1, "foo", 4 ,"bar"]
",".join(imap(str, l))
回答 9
这是清单的例子
>>> myList =[['Apple'],['Orange']]>>> myList =','.join(map(str,[i[0]for i in myList]))>>>print"Output:", myList
Output:Apple,Orange
更准确的:-
>>> myList =[['Apple'],['Orange']]>>> myList =','.join(map(str,[type(i)== list and i[0]for i in myList]))>>>print"Output:", myList
Output:Apple,Orange
>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [i[0] for i in myList]))
>>> print "Output:", myList
Output: Apple,Orange
More Accurate:-
>>> myList = [['Apple'],['Orange']]
>>> myList = ','.join(map(str, [type(i) == list and i[0] for i in myList]))
>>> print "Output:", myList
Output: Apple,Orange
is safer and quite Pythonic, though the resulting string will be difficult to parse if the elements can contain commas — at that point, you need the full power of the csv module, as Douglas points out in his answer.)
回答 12
我的两分钱。我喜欢更简单的python单行代码:
>>>from itertools import imap, ifilter
>>> l =['a','','b',1,None]>>>','.join(imap(str, ifilter(lambda x: x, l)))
a,b,1>>> m =['a','',None]>>>','.join(imap(str, ifilter(lambda x: x, m)))'a'
It’s pythonic, works for strings, numbers, None and empty string. It’s short and satisfies the requirements. If the list is not going to contain numbers, we can use this simpler variation:
>>> ','.join(ifilter(lambda x: x, l))
Also this solution doesn’t create a new list, but uses an iterator, like @Peter Hoffmann pointed (thanks).
>>> myString ='Position of a character'>>> myString.find('s')2>>> myString.find('x')-1
使用 index()
>>> myString ='Position of a character'>>> myString.index('s')2>>> myString.index('x')Traceback(most recent call last):File"<stdin>", line 1,in<module>ValueError: substring not found
There are two string methods for this, find() and index(). The difference between the two is what happens when the search string isn’t found. find() returns -1 and index() raises ValueError.
Using find()
>>> myString = 'Position of a character'
>>> myString.find('s')
2
>>> myString.find('x')
-1
Using index()
>>> myString = 'Position of a character'
>>> myString.index('s')
2
>>> myString.index('x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.
And:
string.index(s, sub[, start[, end]])
Like find() but raise ValueError when the substring is not found.
回答 1
仅出于完整性考虑,如果需要查找字符串中字符的所有位置,可以执行以下操作:
s ='shak#spea#e'
c ='#'print[pos for pos, char in enumerate(s)if char == c]
Just for completion, in the case I want to find the extension in a file name in order to check it, I need to find the last ‘.’, in this case use rfind:
def charposition(string, char):
pos =[]#list to store positions for each 'char' in 'string'for n in range(len(string)):if string[n]== char:
pos.append(n)return pos
s ="sentence"print(charposition(s,'e'))#Output: [1, 4, 7]
A character might appear multiple times in a string. For example in a string sentence, position of e is 1, 4, 7 (because indexing usually starts from zero). but what I find is both of the functions find() and index() returns first position of a character. So, this can be solved doing this:
def charposition(string, char):
pos = [] #list to store positions for each 'char' in 'string'
for n in range(len(string)):
if string[n] == char:
pos.append(n)
return pos
s = "sentence"
print(charposition(s, 'e'))
#Output: [1, 4, 7]
To read user input you can try the cmd module for easily creating a mini-command line interpreter (with help texts and autocompletion) and raw_input (input for Python 3+) for reading a line of text from the user.
text = raw_input("prompt") # Python 2
text = input("prompt") # Python 3
Command line inputs are in sys.argv. Try this in your script:
import sys
print (sys.argv)
There are two modules for parsing command line options: optparse (deprecated since Python 2.7, use argparse instead) and getopt. If you just want to input files to your script, behold the power of fileinput.
Careful not to use the input function, unless you know what you’re doing. Unlike raw_input, input will accept any python expression, so it’s kinda like eval
回答 5
这个简单的程序可以帮助您了解如何从命令行输入用户输入以及如何显示有关传递无效参数的帮助。
import argparse
import sys
try:
parser = argparse.ArgumentParser()
parser.add_argument("square", help="display a square of a given number",
type=int)
args = parser.parse_args()#print the square of user input from cmd line.print args.square**2#print all the sys argument passed from cmd line including the program name.print sys.argv
#print the second argument passed from cmd line; Note it starts from ZEROprint sys.argv[1]except:
e = sys.exc_info()[0]print e
This simple program helps you in understanding how to feed the user input from command line and to show help on passing invalid argument.
import argparse
import sys
try:
parser = argparse.ArgumentParser()
parser.add_argument("square", help="display a square of a given number",
type=int)
args = parser.parse_args()
#print the square of user input from cmd line.
print args.square**2
#print all the sys argument passed from cmd line including the program name.
print sys.argv
#print the second argument passed from cmd line; Note it starts from ZERO
print sys.argv[1]
except:
e = sys.exc_info()[0]
print e
If you are running Python <2.7, you need optparse, which as the doc explains will create an interface to the command line arguments that are called when your application is run.
However, in Python ≥2.7, optparse has been deprecated, and was replaced with the argparse as shown above. A quick example from the docs…
The following code is a Python program that takes a list of integers
and produces either the sum or the max:
import argparse
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('integers', metavar='N', type=int, nargs='+',
help='an integer for the accumulator')
parser.add_argument('--sum', dest='accumulate', action='store_const',
const=sum, default=max,
help='sum the integers (default: find the max)')
args = parser.parse_args()
print args.accumulate(args.integers)
I’m able to update pip-managed packages, but how do I update pip itself? According to pip --version, I currently have pip 1.1 installed in my virtualenv and I want to update to the latest version.
What’s the command for that? Do I need to use distribute or is there a native pip or virtualenv command? I’ve already tried pip update and pip update pip with no success.
I tried all of these solutions mentioned above under Debian Jessie. They don’t work, because it just takes the latest version compile by the debian package manager which is 1.5.6 which equates to version 6.0.x. Some packages that use pip as prerequisites will not work as a results, such as spaCy (which needs the option –no-cache-dir to function correctly).
So the actual best way to solve these problems is to run get-pip.py downloaded using wget, from the website or using curl as follows:
To get this to work for me I had to drill down in the Python directory using the Python command prompt (on WIN10 from VS CODE). In my case it was in my “AppData\Local\Programs\Python\python35-32” directory. From there now I ran the command…
Open Command Prompt with Administrator Permissions, and repeat the command:
python -m pip install --upgrade pip
回答 8
pip版本10有问题。它将显示为错误:
ubuntu@mymachine-:~/mydir$ sudo pip install --upgrade pip
Traceback(most recent call last):File"/usr/bin/pip", line 9,in<module>from pip import main
ImportError: cannot import name main
pip version 10 has an issue. It will manifest as the error:
ubuntu@mymachine-:~/mydir$ sudo pip install --upgrade pip
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
from pip import main
ImportError: cannot import name main
The solution is to be in the venv you want to upgrade and then run:
In case you are using venv any update to pip install will result in upgrading the system pip instead of the venv pip. You need to upgrade the pip bootstrapping packages as well.
I had installed Python in C:\Python\Python36 so I went to the Windows command prompt and typed “cd C:\Python\Python36 to get to the right directory. Then entered the “python -m install –upgrade pip” all good!
Single Line Python Program
The best way I have found is to write a single line program that downloads and runs the official get-pip script. See below for the code.
The official docs recommend using curl to download the get-pip script, but since I work on windows and don’t have curl installed I prefer using python itself to download and run the script.
Here is the single line program that can be run via the command line using Python 3:
Precautions
It’s worth noting that running any python script blindly is inherently dangerous. For this reason, the official instructions recommend downloading the script and inspecting it before running.
That said, many people don’t actually inspect the code and just run it. This one-line program makes that easier.
Very Simple. Just download pip from https://bootstrap.pypa.io/get-pip.py . Save the file in some forlder or dekstop. I saved the file in my D drive.Then from your command prompt navigate to the folder where you have downloaded pip. Then type there
In Python 3+, many processes that iterate over iterables return iterators themselves. In most cases, this ends up saving memory, and should make things go faster.
If all you’re going to do is iterate over this list eventually, there’s no need to even convert it to a list, because you can still iterate over the map object like so:
# Prints "ABCD"
for ch in map(chr,[65,66,67,68]):
print(ch)
Always seeking for shorter ways, I discovered this one also works:
*map(chr, [66, 53, 0, 94]),
Unpacking works in tuples too. Note the comma at the end. This makes it a tuple of 1 element. That is, it’s equivalent to (*map(chr, [66, 53, 0, 94]),)
It’s shorter by only one char from the version with the list-brackets, but, in my opinion, better to write, because you start right ahead with the asterisk – the expansion syntax, so I feel it’s softer on the mind. :)
__global_map = map #keep reference to the original map
lmap =lambda func,*iterable: list(__global_map(func,*iterable))# using "map" here will cause infinite recursion
map = lmap
x =[1,2,3]
map(str, x)#test
map = __global_map #restore the original map and don't do that again
map(str, x)#iterator
List-returning map function has the advantage of saving typing, especially during interactive sessions. You can define lmap function (on the analogy of python2’s imap) that returns list:
Then calling lmap instead of map will do the job:
lmap(str, x) is shorter by 5 characters (30% in this case) than list(map(str, x)) and is certainly shorter than [str(v) for v in x]. You may create similar functions for filter too.
There was a comment to the original question:
I would suggest a rename to Getting map() to return a list in Python 3.* as it applies to all Python3 versions. Is there a way to do this? – meawoppl Jan 24 at 17:58
It is possible to do that, but it is a very bad idea. Just for fun, here’s how you may (but should not) do it:
__global_map = map #keep reference to the original map
lmap = lambda func, *iterable: list(__global_map(func, *iterable)) # using "map" here will cause infinite recursion
map = lmap
x = [1, 2, 3]
map(str, x) #test
map = __global_map #restore the original map and don't do that again
map(str, x) #iterator
Converting my old comment for better visibility: For a “better way to do this” without map entirely, if your inputs are known to be ASCII ordinals, it’s generally much faster to convert to bytes and decode, a la bytes(list_of_ordinals).decode('ascii'). That gets you a str of the values, but if you need a list for mutability or the like, you can just convert it (and it’s still faster). For example, in ipython microbenchmarks converting 45 inputs:
If you leave it as a str, it takes ~20% of the time of the fastest map solutions; even converting back to list it’s still less than 40% of the fastest map solution. Bulk convert via bytes and bytes.decode then bulk converting back to list saves a lot of work, but as noted, only works if all your inputs are ASCII ordinals (or ordinals in some one byte per character locale specific encoding, e.g. latin-1).
map(func, *iterables) –> map object
Make an iterator that computes the function using arguments from
each of the iterables. Stops when the shortest iterable is exhausted.
“Make an iterator”
means it will return an iterator.
“that computes the function using arguments from each of the iterables”
means that the next() function of the iterator will take one value of each iterables and pass each of them to one positional parameter of the function.
So you get an iterator from the map() funtion and jsut pass it to the list() builtin function or use list comprehensions.
回答 6
除了上述答案外Python 3,我们还可以简单地list从mapas中创建结果值a
li =[]for x in map(chr,[66,53,0,94]):
li.append(x)print(li)>>>['B','5','\x00','^']
In addition to above answers in Python 3, we may simply create a list of result values from a map as
li = []
for x in map(chr,[66,53,0,94]):
li.append(x)
print (li)
>>>['B', '5', '\x00', '^']
We may generalize by another example where I was struck, operations on map can also be handled in similar fashion like in regex problem, we can write function to obtain list of items to map and get result set at the same time. Ex.
b = 'Strings: 1,072, Another String: 474 '
li = []
for x in map(int,map(int, re.findall('\d+', b))):
li.append(x)
print (li)
>>>[1, 72, 474]
回答 7
您可以尝试通过仅迭代对象中的每个项目并将其存储在另一个变量中来从地图对象获取列表。
a = map(chr,[66,53,0,94])
b =[item for item in a]print(b)>>>['B','5','\x00','^']