I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)
I pretty much screwed up on the first interview problem…
Question: Given a string of a million numbers (Pi for example), write
a function/program that returns all repeating 3 digit numbers and number of
repetition greater than 1
For example: if the string was: 123412345123456 then the function/program would return:
123 - 3 times
234 - 3 times
345 - 2 times
They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:
000 –> 999
Now that I’m thinking about it, I don’t think it’s possible to come up with a constant time algorithm. Is it?
#include<stdio.h>#include<string.h>int main(void){staticchar inpStr[100000000+1];staticint freq[1000];// Set up test data.
memset(inpStr,'1',sizeof(inpStr));
inpStr[sizeof(inpStr)-1]='\0';// Need at least three digits to do anything useful.if(strlen(inpStr)<=2)return0;// Get initial feed from first two digits, process others.int val =(inpStr[0]-'0')*10+ inpStr[1]-'0';char*inpPtr =&(inpStr[2]);while(*inpPtr !='\0'){// Remove hundreds, add next digit as units, adjust table.
val =(val %100)*10+*inpPtr++-'0';
freq[val]++;}// Output (relevant part of) table.for(int i =0; i <1000;++i)if(freq[i]>1)
printf("%3d -> %d\n", i, freq[i]);return0;}
You got off lightly, you probably don’t want to be working for a hedge fund where the quants don’t understand basic algorithms :-)
There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.
Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that’s not usually how people use complexity analysis.
It appears to me you could have impressed them in a number of ways.
First, by informing them that it’s not possible to do it in O(1), unless you use the “suspect” reasoning given above.
Second, by showing your elite skills by providing Pythonic code such as:
inpStr = '123412345123456'
# O(1) array creation.
freq = [0] * 1000
# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
freq[val] += 1
# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])
This outputs:
[(123, 3), (234, 3), (345, 2)]
though you could, of course, modify the output format to anything you desire.
And, finally, by telling them there’s almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.
And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.
Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):
vv
123412 vv
123451
5123456
You can farm these out to separate workers and combine the results afterwards.
The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of “measure, don’t guess” applies here, of course.
This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.
For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:
#include <stdio.h>
#include <string.h>
int main(void) {
static char inpStr[100000000+1];
static int freq[1000];
// Set up test data.
memset(inpStr, '1', sizeof(inpStr));
inpStr[sizeof(inpStr)-1] = '\0';
// Need at least three digits to do anything useful.
if (strlen(inpStr) <= 2) return 0;
// Get initial feed from first two digits, process others.
int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
char *inpPtr = &(inpStr[2]);
while (*inpPtr != '\0') {
// Remove hundreds, add next digit as units, adjust table.
val = (val % 100) * 10 + *inpPtr++ - '0';
freq[val]++;
}
// Output (relevant part of) table.
for (int i = 0; i < 1000; ++i)
if (freq[i] > 1)
printf("%3d -> %d\n", i, freq[i]);
return 0;
}
Constant time isn’t possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.
For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.
from collections importCounterdef triple_counter(s):
c =Counter(s[n-3: n]for n in range(3, len(s)))for tri, n in c.most_common():if n >1:print('%s - %i times.'%(tri, n))else:breakif __name__ =='__main__':import random
s =''.join(random.choice('0123456789')for _ in range(1_000_000))
triple_counter(s)
A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:
from collections import Counter
def triple_counter(s):
c = Counter(s[n-3: n] for n in range(3, len(s)))
for tri, n in c.most_common():
if n > 1:
print('%s - %i times.' % (tri, n))
else:
break
if __name__ == '__main__':
import random
s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
triple_counter(s)
Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.
Parallel execution version
I wrote a blog post on this with more explanation.
回答 3
简单的O(n)解决方案是对每个3位数字进行计数:
for nr in range(1000):
cnt = text.count('%03d'% nr)if cnt >1:print'%03d is found %d times'%(nr, cnt)
这将搜索全部100万个数字1000次。
仅遍历数字一次:
counts =[0]*1000for idx in range(len(text)-2):
counts[int(text[idx:idx+3])]+=1for nr, cnt in enumerate(counts):if cnt >1:print'%03d is found %d times'%(nr, cnt)
The simple O(n) solution would be to count each 3-digit number:
for nr in range(1000):
cnt = text.count('%03d' % nr)
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
This would search through all 1 million digits 1000 times.
Traversing the digits only once:
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
for nr, cnt in enumerate(counts):
if cnt > 1:
print '%03d is found %d times' % (nr, cnt)
Timing shows that iterating only once over the index is twice as fast as using count.
def setup_data(n):import random
digits ="0123456789"return dict(text =''.join(random.choice(digits)for i in range(n)))def f_np(text):# Get the data into NumPyimport numpy as np
a = np.frombuffer(bytes(text,'utf8'), dtype=np.uint8)- ord('0')# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a,(3, a.size-2),2*a.strides)
bins = np.zeros((10,10,10), dtype=int)# Next line performs O(n) binning
np.add.at(bins, tuple(a3),1)# Filtering is left as an exercisereturn bins.ravel()def f_py(text):
counts =[0]*1000for idx in range(len(text)-2):
counts[int(text[idx:idx+3])]+=1return counts
import numpy as np
import types
from timeit import timeit
for n in(10,1000,1000000):
data = setup_data(n)
ref = f_np(**data)print(f'n = {n}')for name, func in list(globals().items()):ifnot name.startswith('f_')ornot isinstance(func, types.FunctionType):continuetry:assert np.all(ref == func(**data))print("{:16s}{:16.8f} ms".format(name[2:], timeit('f(**data)', globals={'f':func,'data':data}, number=10)*100))except:print("{:16s} apparently crashed".format(name[2:]))
毫不奇怪,在大型数据集上,NumPy比@Daniel的纯Python解决方案要快一点。样本输出:
# n = 10# np 0.03481400 ms# py 0.00669330 ms# n = 1000# np 0.11215360 ms# py 0.34836530 ms# n = 1000000# np 82.46765980 ms# py 360.51235450 ms
Here is a NumPy implementation of the “consensus” O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say “385”, adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.
def setup_data(n):
import random
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_np(text):
# Get the data into NumPy
import numpy as np
a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
# Rolling triplets
a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
bins = np.zeros((10, 10, 10), dtype=int)
# Next line performs O(n) binning
np.add.at(bins, tuple(a3), 1)
# Filtering is left as an exercise
return bins.ravel()
def f_py(text):
counts = [0] * 1000
for idx in range(len(text)-2):
counts[int(text[idx:idx+3])] += 1
return counts
import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
data = setup_data(n)
ref = f_np(**data)
print(f'n = {n}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
try:
assert np.all(ref == func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
except:
print("{:16s} apparently crashed".format(name[2:]))
Unsurprisingly, NumPy is a bit faster than @Daniel’s pure Python solution on large data sets. Sample output:
# n = 10
# np 0.03481400 ms
# py 0.00669330 ms
# n = 1000
# np 0.11215360 ms
# py 0.34836530 ms
# n = 1000000
# np 82.46765980 ms
# py 360.51235450 ms
回答 5
我将解决以下问题:
def find_numbers(str_num):
final_dict ={}
buffer ={}for idx in range(len(str_num)-3):
num = int(str_num[idx:idx +3])if num notin buffer:
buffer[num]=0
buffer[num]+=1if buffer[num]>1:
final_dict[num]= buffer[num]return final_dict
def find_numbers(str_num):
final_dict = {}
buffer = {}
for idx in range(len(str_num) - 3):
num = int(str_num[idx:idx + 3])
if num not in buffer:
buffer[num] = 0
buffer[num] += 1
if buffer[num] > 1:
final_dict[num] = buffer[num]
return final_dict
As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn’t exists already in the dictionary.
The code will look something like this:
def calc_repeating_digits(number):
hash = {}
for i in range(len(str(number))-2):
current_three_digits = number[i:i+3]
if current_three_digits in hash.keys():
hash[current_three_digits] += 1
else:
hash[current_three_digits] = 1
return hash
You can filter down to the keys which have item value greater than 1.
As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.
However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.
My guess is that either the interviewer misspoke when they gave you the solution, or you misheard “constant time” when they said “constant space.”
回答 8
这是我的答案:
from timeit import timeit
from collections importCounterimport types
import random
def setup_data(n):
digits ="0123456789"return dict(text =''.join(random.choice(digits)for i in range(n)))def f_counter(text):
c =Counter()for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])return(i for i in c.items()if i[1]>1)def f_dict(text):
d ={}for i in range(len(text)-2):
ss = text[i:i+3]if ss notin d:
d[ss]=0
d[ss]+=1return((i, d[i])for i in d if d[i]>1)def f_array(text):
a =[[[0for _ in range(10)]for _ in range(10)]for _ in range(10)]for n in range(len(text)-2):
i, j, k =(int(ss)for ss in text[n:n+3])
a[i][j][k]+=1for i, b in enumerate(a):for j, c in enumerate(b):for k, d in enumerate(c):if d >1:yield(f'{i}{j}{k}', d)for n in(1E1,1E3,1E6):
n = int(n)
data = setup_data(n)print(f'n = {n}')
results ={}for name, func in list(globals().items()):ifnot name.startswith('f_')ornot isinstance(func, types.FunctionType):continueprint("{:16s}{:16.8f} ms".format(name[2:], timeit('results[name] = f(**data)', globals={'f':func,'data':data,'results':results,'name':name}, number=10)*100))for r in results:print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
n =10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter :[]
f_dict :[]
f_array :[]
n =1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter :[('008',2),('009',3),('010',2),('016',2),('017',2)]
f_dict :[('008',2),('009',3),('010',2),('016',2),('017',2)]
f_array :[('008',2),('009',3),('010',2),('016',2),('017',2)]
n =1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter :[('000',1058),('001',943),('002',1030),('003',982),('004',1042)]
f_dict :[('000',1058),('001',943),('002',1030),('003',982),('004',1042)]
f_array :[('000',1058),('001',943),('002',1030),('003',982),('004',1042)]
from timeit import timeit
from collections import Counter
import types
import random
def setup_data(n):
digits = "0123456789"
return dict(text = ''.join(random.choice(digits) for i in range(n)))
def f_counter(text):
c = Counter()
for i in range(len(text)-2):
ss = text[i:i+3]
c.update([ss])
return (i for i in c.items() if i[1] > 1)
def f_dict(text):
d = {}
for i in range(len(text)-2):
ss = text[i:i+3]
if ss not in d:
d[ss] = 0
d[ss] += 1
return ((i, d[i]) for i in d if d[i] > 1)
def f_array(text):
a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
for n in range(len(text)-2):
i, j, k = (int(ss) for ss in text[n:n+3])
a[i][j][k] += 1
for i, b in enumerate(a):
for j, c in enumerate(b):
for k, d in enumerate(c):
if d > 1: yield (f'{i}{j}{k}', d)
for n in (1E1, 1E3, 1E6):
n = int(n)
data = setup_data(n)
print(f'n = {n}')
results = {}
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types.FunctionType):
continue
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
for r in results:
print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))
The array lookup method is very fast (even faster than @paul-panzer’s numpy method!). Of course, it cheats since it isn’t technicailly finished after it completes, because it’s returning a generator. It also doesn’t have to check every iteration if the value already exists, which is likely to help a lot.
n = 10
counter 0.10595780 ms
dict 0.01070654 ms
array 0.00135370 ms
f_counter : []
f_dict : []
f_array : []
n = 1000
counter 2.89462101 ms
dict 0.40434612 ms
array 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter 2849.00500992 ms
dict 438.44007806 ms
array 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
from collections import defaultdict
string ="103264685134845354863"
d = defaultdict(int)for elt in range(len(string)-2):
d[string[elt:elt+3]]+=1
d ={key: d[key]for key in d.keys()if d[key]>1}
from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}
With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point.
Hope it helps :)
-Telling from the perspective of C.
-You can have an int 3-d array results[10][10][10];
-Go from 0th location to n-4th location, where n being the size of the string array.
-On each location, check the current, next and next’s next.
-Increment the cntr as resutls[current][next][next’s next]++;
-Print the values of
-It is O(n) time, there is no comparisons involved.
-You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.
回答 12
inputStr ='123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count ={}for i in range(len(inputStr)-2):
subNum = int(inputStr[i:i+3])if subNum notin count:
count[subNum]=1else:
count[subNum]+=1print count
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
count = {}
for i in range(len(inputStr) - 2):
subNum = int(inputStr[i:i+3])
if subNum not in count:
count[subNum] = 1
else:
count[subNum] += 1
print count
In class we are doing sorting algorithms and, although I understand them fine when talking about them and writing pseudocode, I am having problems writing actual code for them.
This is my attempt in Python:
mylist = [12, 5, 13, 8, 9, 65]
def bubble(badList):
length = len(badList) - 1
unsorted = True
while unsorted:
for element in range(0,length):
unsorted = False
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
print badList
else:
unsorted = True
print bubble(mylist)
Now, this (as far as I can tell) sorts correctly, but once it finishes it just loops indefinitely.
How can this code be fixed so the function finishes properly and correctly sorts a list of any (reasonable) size?
P.S. I know I should not really have prints in a function and I should have a return, but I just have not done that yet as my code does not really work yet.
sorted =False# We haven't started sorting yetwhilenot sorted:
sorted =True# Assume the list is now sortedfor element in range(0, length):if badList[element]> badList[element +1]:
sorted =False# We found two elements in the wrong order
hold = badList[element +1]
badList[element +1]= badList[element]
badList[element]= hold# We went through the whole list. At this point, if there were no elements# in the wrong order, sorted is still True. Otherwise, it's false, and the# while loop executes again.
To explain why your script isn’t working right now, I’ll rename the variable unsorted to sorted.
At first, your list isn’t yet sorted. Of course, we set sorted to False.
As soon as we start the while loop, we assume that the list is already sorted. The idea is this: as soon as we find two elements that are not in the right order, we set sorted back to False. sorted will remain Trueonly if there were no elements in the wrong order.
sorted = False # We haven't started sorting yet
while not sorted:
sorted = True # Assume the list is now sorted
for element in range(0, length):
if badList[element] > badList[element + 1]:
sorted = False # We found two elements in the wrong order
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
# We went through the whole list. At this point, if there were no elements
# in the wrong order, sorted is still True. Otherwise, it's false, and the
# while loop executes again.
There are also minor little issues that would help the code be more efficient or readable.
In the for loop, you use the variable element. Technically, element is not an element; it’s a number representing a list index. Also, it’s quite long. In these cases, just use a temporary variable name, like i for “index”.
for i in range(0, length):
The range command can also take just one argument (named stop). In that case, you get a list of all the integers from 0 to that argument.
for i in range(length):
The Python Style Guide recommends that variables be named in lowercase with underscores. This is a very minor nitpick for a little script like this; it’s more to get you accustomed to what Python code most often resembles.
def bubble(bad_list):
To swap the values of two variables, write them as a tuple assignment. The right hand side gets evaluated as a tuple (say, (badList[i+1], badList[i]) is (3, 5)) and then gets assigned to the two variables on the left hand side ((badList[i], badList[i+1])).
def bubble(badList):
length = len(badList)for i in range(0,length):
swapped =Falsefor element in range(0, length-i-1):if badList[element]> badList[element +1]:
hold = badList[element +1]
badList[element +1]= badList[element]
badList[element]= hold
swapped =Trueifnot swapped:breakreturn badList
您的版本1已更正:
def bubble(badList):
length = len(badList)-1
unsorted =Truewhile unsorted:
unsorted =Falsefor element in range(0,length):#unsorted = Falseif badList[element]> badList[element +1]:
hold = badList[element +1]
badList[element +1]= badList[element]
badList[element]= hold
unsorted =True#print badList#else:#unsorted = Truereturn badList
The goal of bubble sort is to move the heavier items at the bottom in each round, while moving the lighter items up. In the inner loop, where you compare the elements, you don’t have to iterate the whole list in each turn. The heaviest is already placed last. The swapped variable is an extra check so we can mark that the list is now sorted and avoid continuing with unnecessary calculations.
def bubble(badList):
length = len(badList)
for i in range(0,length):
swapped = False
for element in range(0, length-i-1):
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
swapped = True
if not swapped: break
return badList
Your version 1, corrected:
def bubble(badList):
length = len(badList) - 1
unsorted = True
while unsorted:
unsorted = False
for element in range(0,length):
#unsorted = False
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
unsorted = True
#print badList
#else:
#unsorted = True
return badList
回答 2
当您使用负含义的变量名时,您需要反转它们的值,这就是发生的情况。以下内容将更容易理解:
sorted =Falsewhilenot sorted:...
另一方面,算法的逻辑有点偏离。您需要检查在for循环期间是否交换了两个元素。这是我的写法:
def bubble(values):
length = len(values)-1
sorted =Falsewhilenot sorted:
sorted =Truefor element in range(0,length):if values[element]> values[element +1]:
hold = values[element +1]
values[element +1]= values[element]
values[element]= hold
sorted =Falsereturn values
This is what happens when you use variable name of negative meaning, you need to invert their values. The following would be easier to understand:
sorted = False
while not sorted:
...
On the other hand, the logic of the algorithm is a little bit off. You need to check whether two elements swapped during the for loop. Here’s how I would write it:
def bubble(values):
length = len(values) - 1
sorted = False
while not sorted:
sorted = True
for element in range(0,length):
if values[element] > values[element + 1]:
hold = values[element + 1]
values[element + 1] = values[element]
values[element] = hold
sorted = False
return values
Your use of the Unsorted variable is wrong; you want to have a variable that tells you if you have swapped two elements; if you have done that, you can exit your loop, otherwise, you need to loop again. To fix what you’ve got here, just put the “unsorted = false” in the body of your if case; remove your else case; and put “unsorted = true before your for loop.
回答 4
def bubble_sort(l):for passes_left in range(len(l)-1,0,-1):for index in range(passes_left):if l[index]< l[index +1]:
l[index], l[index +1]= l[index +1], l[index]return l
def bubble_sort(l):
for passes_left in range(len(l)-1, 0, -1):
for index in range(passes_left):
if l[index] < l[index + 1]:
l[index], l[index + 1] = l[index + 1], l[index]
return l
You’ve got a couple of errors in there. The first is in length, and the second is in your use of unsorted (as stated by McWafflestix). You probably also want to return the list if you’re going to print it:
mylist = [12, 5, 13, 8, 9, 65]
def bubble(badList):
length = len(badList) - 2
unsorted = True
while unsorted:
for element in range(0,length):
unsorted = False
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
print badList
unsorted = True
return badList
print bubble(mylist)
eta: You’re right, the above is buggy as hell. My bad for not testing through some more examples.
def bubble2(badList):
swapped = True
length = len(badList) - 2
while swapped:
swapped = False
for i in range(0, length):
if badList[i] > badList[i + 1]:
# swap
hold = badList[i + 1]
badList[i + 1] = badList[i]
badList[i] = hold
swapped = True
return badList
I am a fresh fresh beginner, started to read about Python yesterday.
Inspired by your example I created something maybe more in the 80-ties style, but nevertheless it kinda works
lista1 = [12, 5, 13, 8, 9, 65]
i=0
while i < len(lista1)-1:
if lista1[i] > lista1[i+1]:
x = lista1[i]
lista1[i] = lista1[i+1]
lista1[i+1] = x
i=0
continue
else:
i+=1
print(lista1)
mylist =[9,8,5,4,12,1,7,5,2]print mylist
def bubble(badList):
length = len(badList)-1
element =0while element < length:if badList[element]> badList[element +1]:
hold = badList[element +1]
badList[element +1]= badList[element]
badList[element]= hold
element =0print badList
else:
element = element +1print bubble(mylist)
The problem with the original algorithm is that if you had a lower number further in the list, it would not bring it to the correct sorted position. The program needs to go back the the beginning each time to ensure that the numbers sort all the way through.
I simplified the code and it will now work for any list of numbers regardless of the list and even if there are repeating numbers. Here’s the code
mylist = [9, 8, 5, 4, 12, 1, 7, 5, 2]
print mylist
def bubble(badList):
length = len(badList) - 1
element = 0
while element < length:
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
element = 0
print badList
else:
element = element + 1
print bubble(mylist)
回答 9
def bubble_sort(l):
exchanged =True
iteration =0
n = len(l)while(exchanged):
iteration +=1
exchanged =False# Move the largest element to the end of the listfor i in range(n-1):if l[i]> l[i+1]:
exchanged =True
l[i], l[i+1]= l[i+1], l[i]
n -=1# Largest element already towards the endprint'Iterations: %s'%(iteration)return l
def bubble_sort(l):
exchanged = True
iteration = 0
n = len(l)
while(exchanged):
iteration += 1
exchanged = False
# Move the largest element to the end of the list
for i in range(n-1):
if l[i] > l[i+1]:
exchanged = True
l[i], l[i+1] = l[i+1], l[i]
n -= 1 # Largest element already towards the end
print 'Iterations: %s' %(iteration)
return l
回答 10
def bubbleSort(alist):if len(alist)<=1:return alist
for i in range(0,len(alist)):print"i is :%d",i
for j in range(0,i):print"j is:%d",j
print"alist[i] is :%d, alist[j] is :%d"%(alist[i],alist[j])if alist[i]> alist[j]:
alist[i],alist[j]= alist[j],alist[i]return alist
def bubbleSort(alist):
if len(alist) <= 1:
return alist
for i in range(0,len(alist)):
print "i is :%d",i
for j in range(0,i):
print "j is:%d",j
print "alist[i] is :%d, alist[j] is :%d"%(alist[i],alist[j])
if alist[i] > alist[j]:
alist[i],alist[j] = alist[j],alist[i]
return alist
def bubble_sort(a):
t =0
sorted =False# sorted = False because we have not began to sortwhilenot sorted:
sorted =True# Assume sorted = True first, it will switch only there is any changefor key in range(1,len(a)):if a[key-1]> a[key]:
sorted =False
t = a[key-1]; a[key-1]= a[key]; a[key]= t;print a
def bubble_sort(a):
t = 0
sorted = False # sorted = False because we have not began to sort
while not sorted:
sorted = True # Assume sorted = True first, it will switch only there is any change
for key in range(1,len(a)):
if a[key-1] > a[key]:
sorted = False
t = a[key-1]; a[key-1] = a[key]; a[key] = t;
print a
回答 12
一个简单的例子:
a = len(alist)-1while a >0:for b in range(0,a):#compare with the adjacent elementif alist[b]>=alist[b+1]:#swap both elements
alist[b], alist[b+1]= alist[b+1], alist[b]
a-=1
a = len(alist)-1
while a > 0:
for b in range(0,a):
#compare with the adjacent element
if alist[b]>=alist[b+1]:
#swap both elements
alist[b], alist[b+1] = alist[b+1], alist[b]
a-=1
This simply takes the elements from 0 to a(basically, all the unsorted elements in that round) and compares it with its adjacent element, and making a swap if it is greater than its adjacent element. At the end the round, the last element is sorted, and the process runs again without it, until all elements have been sorted.
There is no need for a condition whether sort is true or not.
Note that this algorithm takes into consideration the position of the numbers only when swapping, so repeated numbers will not affect it.
PS. I know it has been very long since this question was posted, but I just wanted to share this idea.
回答 13
arr =[5,4,3,1,6,8,10,9]# array not sortedfor i in range(len(arr)):for j in range(i, len(arr)):if(arr[i]> arr[j]):
arr[i], arr[j]= arr[j], arr[i]print(arr)
arr = [5,4,3,1,6,8,10,9] # array not sorted
for i in range(len(arr)):
for j in range(i, len(arr)):
if(arr[i] > arr[j]):
arr[i], arr[j] = arr[j], arr[i]
print (arr)
回答 14
def bubble_sort(li):
l = len(li)
tmp =None
sorted_l = sorted(li)while(li != sorted_l):for ele in range(0,l-1):if li[ele]> li[ele+1]:
tmp = li[ele+1]
li[ele+1]= li [ele]
li[ele]= tmp
return li
def bubble_sort(li):
l = len(li)
tmp = None
sorted_l = sorted(li)
while (li != sorted_l):
for ele in range(0,l-1):
if li[ele] > li[ele+1]:
tmp = li[ele+1]
li[ele+1] = li [ele]
li[ele] = tmp
return li
回答 15
def bubbleSort ( arr ):
swapped =True
length = len ( arr )
j =0while swapped:
swapped =False
j +=1for i in range ( length - j ):if arr [ i ]> arr [ i +1]:# swap
tmp = arr [ i ]
arr [ i ]= arr [ i +1]
arr [ i +1]= tmp
swapped =Trueif __name__ =='__main__':# test list
a =[67,45,39,-1,-5,-44];print( a )
bubbleSort ( a )print( a )
def bubbleSort ( arr ):
swapped = True
length = len ( arr )
j = 0
while swapped:
swapped = False
j += 1
for i in range ( length - j ):
if arr [ i ] > arr [ i + 1 ]:
# swap
tmp = arr [ i ]
arr [ i ] = arr [ i + 1]
arr [ i + 1 ] = tmp
swapped = True
if __name__ == '__main__':
# test list
a = [ 67, 45, 39, -1, -5, -44 ];
print ( a )
bubbleSort ( a )
print ( a )
回答 16
def bubblesort(array):for i in range(len(array)-1):for j in range(len(array)-1-i):if array[j]> array[j+1]:
array[j], array[j+1]= array[j+1], array[j]return(array)print(bubblesort([3,1,6,2,5,4]))
def bubblesort(array):
for i in range(len(array)-1):
for j in range(len(array)-1-i):
if array[j] > array[j+1]:
array[j], array[j+1] = array[j+1], array[j]
return(array)
print(bubblesort([3,1,6,2,5,4]))
回答 17
我考虑添加我的解决方案,因为这里曾经有解决方案
更长的时间
更大的空间复杂度
或进行过多的操作
那应该是
所以,这是我的解决方案:
def countInversions(arr):
count =0
n = len(arr)for i in range(n):
_count = count
for j in range(0, n - i -1):if arr[j]> arr[j +1]:
count +=1
arr[j], arr[j +1]= arr[j +1], arr[j]if _count == count:breakreturn count
I consider adding my solution because ever solution here is having
greater time
greater space complexity
or doing too much operations
then is should be
So, here is my solution:
def countInversions(arr):
count = 0
n = len(arr)
for i in range(n):
_count = count
for j in range(0, n - i - 1):
if arr[j] > arr[j + 1]:
count += 1
arr[j], arr[j + 1] = arr[j + 1], arr[j]
if _count == count:
break
return count
回答 18
如果有人对使用列表理解的较短实现感兴趣:
def bubble_sort(lst: list)->None:[swap_items(lst, i, i+1)for left in range(len(lst)-1,0,-1)for i in range(left)if lst[i]> lst[i+1]]def swap_items(lst: list, pos1: int, pos2: int)->None:
lst[pos1], lst[pos2]= lst[pos2], lst[pos1]
If anyone is interested in a shorter implementation using a list comprehension:
def bubble_sort(lst: list) -> None:
[swap_items(lst, i, i+1) for left in range(len(lst)-1, 0, -1) for i in range(left) if lst[i] > lst[i+1]]
def swap_items(lst: list, pos1: int, pos2: int) -> None:
lst[pos1], lst[pos2] = lst[pos2], lst[pos1]
Here is a different variation of bubble sort without for loop. Basically you are considering the lastIndex of the array and slowly decrementing it until it first index of the array.
The algorithm will continue to move through the array like this until an entire pass is made without any swaps occurring.
The bubble is sort is basically Quadratic Time: O(n²) when it comes to performance.
def bubble(badList):
length = len(badList)-1
n =0while n < len(badList):for element in range(0,length):if badList[element]> badList[element +1]:
hold = badList[element +1]
badList[element +1]= badList[element]
badList[element]= hold
n =0else:
n +=1return badList
if __name__ =='__main__':
mylist =[90,10,2,76,17,66,57,23,57,99]print bubble(mylist)
Answers provided by the-fury and Martin Cote fixed the problem of the infinite loop, but my code would still not work correctly (for a larger list, it would not sort correctly.). I ended up ditching the unsorted variable and used a counter instead.
def bubble(badList):
length = len(badList) - 1
n = 0
while n < len(badList):
for element in range(0,length):
if badList[element] > badList[element + 1]:
hold = badList[element + 1]
badList[element + 1] = badList[element]
badList[element] = hold
n = 0
else:
n += 1
return badList
if __name__ == '__main__':
mylist = [90, 10, 2, 76, 17, 66, 57, 23, 57, 99]
print bubble(mylist)
If anyone could provide any pointers on how to improve my code in the comments, it would be much appreciated.
回答 21
试试这个
a = int(input("Enter Limit"))
val =[]for z in range(0,a):
b = int(input("Enter Number in List"))
val.append(b)for y in range(0,len(val)):for x in range(0,len(val)-1):if val[x]>val[x+1]:
t = val[x]
val[x]= val[x+1]
val[x+1]= t
print(val)
a = int(input("Enter Limit"))
val = []
for z in range(0,a):
b = int(input("Enter Number in List"))
val.append(b)
for y in range(0,len(val)):
for x in range(0,len(val)-1):
if val[x]>val[x+1]:
t = val[x]
val[x] = val[x+1]
val[x+1] = t
print(val)
回答 22
idk如果这可能会在9年后对您有所帮助…它是一个简单的气泡排序程序
l=[1,6,3,7,5,9,8,2,4,10]for i in range(1,len(l)):for j in range (i+1,len(l)):if l[i]>l[j]:
l[i],l[j]=l[j],l[i]
def merge_bubble(arr):
k = len(arr)
while k>2:
for i in range(0,k-1):
for j in range(0,k-1):
if arr[j] > arr[j+1]:
arr[j],arr[j+1] = arr[j+1],arr[j]
return arr
break
else:
if arr[0] > arr[1]:
arr[0],arr[1] = arr[1],arr[0]
return arr
回答 24
def bubble_sort(l):for i in range(len(l)-1):for j in range(len(l)-i-1):if l[j]> l[j+1]:
l[j],l[j+1]= l[j+1], l[j]return l
def bubble_sorted(arr:list):
while True:
for i in range(0,len(arr)-1):
count = 0
if arr[i] > arr[i+1]:
count += 1
arr[i], arr[i+1] = arr[i+1], arr[i]
if count == 0:
break
return arr
arr = [30,20,80,40,50,10,60,70,90]
print(bubble_sorted(arr))
#[20, 30, 40, 50, 10, 60, 70, 80, 90]
回答 26
def bubbleSort(a):
def swap(x, y):
temp = a[x]
a[x] = a[y]
a[y] = temp
#outer loop
for j in range(len(a)):
#slicing to the center, inner loop, python style
for i in range(j, len(a) - j):
#find the min index and swap
if a[i] < a[j]:
swap(j, i)
#find the max index and swap
if a[i] > a[len(a) - j - 1]:
swap(len(a) - j - 1, i)
return a
def bubbleSort(a):
def swap(x, y):
temp = a[x]
a[x] = a[y]
a[y] = temp
#outer loop
for j in range(len(a)):
#slicing to the center, inner loop, python style
for i in range(j, len(a) - j):
#find the min index and swap
if a[i] < a[j]:
swap(j, i)
#find the max index and swap
if a[i] > a[len(a) - j - 1]:
swap(len(a) - j - 1, i)
return a
I need an algorithm that can give me positions around a sphere for N points (less than 20, probably) that vaguely spreads them out. There’s no need for “perfection”, but I just need it so none of them are bunched together.
This question provided good code, but I couldn’t find a way to make this uniform, as this seemed 100% randomized.
This blog post recommended had two ways allowing input of number of points on the sphere, but the Saff and Kuijlaars algorithm is exactly in psuedocode I could transcribe, and the code example I found contained “node[k]”, which I couldn’t see explained and ruined that possibility. The second blog example was the Golden Section Spiral, which gave me strange, bunched up results, with no clear way to define a constant radius.
This algorithm from this question seems like it could possibly work, but I can’t piece together what’s on that page into psuedocode or anything.
A few other question threads I came across spoke of randomized uniform distribution, which adds a level of complexity I’m not concerned about. I apologize that this is such a silly question, but I wanted to show that I’ve truly looked hard and still come up short.
So, what I’m looking for is simple pseudocode to evenly distribute N points around a unit sphere, that either returns in spherical or Cartesian coordinates. Even better if it can even distribute with a bit of randomization (think planets around a star, decently spread out, but with room for leeway).
> cat ll.py
from math import asin
nx =4; ny =5for x in range(nx):
lon =360*((x+0.5)/ nx)for y in range(ny):
midpt =(y+0.5)/ ny
lat =180* asin(2*((y+0.5)/ny-0.5))print lon,lat
> python2.7 ll.py
45.0-166.9131392445.0-74.073032292145.00.045.074.073032292145.0166.91313924135.0-166.91313924135.0-74.0730322921135.00.0135.074.0730322921135.0166.91313924225.0-166.91313924225.0-74.0730322921225.00.0225.074.0730322921225.0166.91313924315.0-166.91313924315.0-74.0730322921315.00.0315.074.0730322921315.0166.91313924
In this example codenode[k] is just the kth node. You are generating an array N points and node[k] is the kth (from 0 to N-1). If that is all that is confusing you, hopefully you can use that now.
(in other words, k is an array of size N that is defined before the code fragment starts, and which contains a list of the points).
Alternatively, building on the other answer here (and using Python):
> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
lon = 360 * ((x+0.5) / nx)
for y in range(ny):
midpt = (y+0.5) / ny
lat = 180 * asin(2*((y+0.5)/ny-0.5))
print lon,lat
> python2.7 ll.py
45.0 -166.91313924
45.0 -74.0730322921
45.0 0.0
45.0 74.0730322921
45.0 166.91313924
135.0 -166.91313924
135.0 -74.0730322921
135.0 0.0
135.0 74.0730322921
135.0 166.91313924
225.0 -166.91313924
225.0 -74.0730322921
225.0 0.0
225.0 74.0730322921
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924
If you plot that, you’ll see that the vertical spacing is larger near the poles so that each point is situated in about the same total area of space (near the poles there’s less space “horizontally”, so it gives more “vertically”).
This isn’t the same as all points having about the same distance to their neighbours (which is what I think your links are talking about), but it may be sufficient for what you want and improves on simply making a uniform lat/lon grid.
import math
def fibonacci_sphere(samples=1):
points =[]
phi = math.pi *(3.- math.sqrt(5.))# golden angle in radiansfor i in range(samples):
y =1-(i / float(samples -1))*2# y goes from 1 to -1
radius = math.sqrt(1- y * y)# radius at y
theta = phi * i # golden angle increment
x = math.cos(theta)* radius
z = math.sin(theta)* radius
points.append((x, y, z))return points
import math
def fibonacci_sphere(samples=1):
points = []
phi = math.pi * (3. - math.sqrt(5.)) # golden angle in radians
for i in range(samples):
y = 1 - (i / float(samples - 1)) * 2 # y goes from 1 to -1
radius = math.sqrt(1 - y * y) # radius at y
theta = phi * i # golden angle increment
x = math.cos(theta) * radius
z = math.sin(theta) * radius
points.append((x, y, z))
return points
from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp
num_pts =100
indices = arange(0, num_pts, dtype=float)+0.5
r = sqrt(indices/num_pts)
theta = pi *(1+5**0.5)* indices
pp.scatter(r*cos(theta), r*sin(theta))
pp.show()
我们的区域元素,这是[R d [R d θ,现在变成了没有,备受更复杂的罪孽(φ)d φ d θ。因此,我们对统一的间距联合密度是罪(φ)/4π。积分出θ,我们发现˚F(φ)= SIN(φ)/ 2,从而˚F(φ)=(1 – COS(φ))/ 2。反相此我们可以看到,一个均匀随机变量看起来像ACOS(1 – 2 ü),但我们采样均匀,而不是随机的,所以我们改为使用φ ķ = ACOS(1 – 2( ķ+ 0.5)/ N)。算法的其余部分只是将其投影到x,y和z坐标上:
from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp
num_pts =1000
indices = arange(0, num_pts, dtype=float)+0.5
phi = arccos(1-2*indices/num_pts)
theta = pi *(1+5**0.5)* indices
x, y, z = cos(theta)* sin(phi), sin(theta)* sin(phi), cos(phi);
pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()
一旦您知道这是结果,证明就很简单。如果您问z < Z < z + d z的概率是什么,这与问z < F -1(U)< z + d z的概率是什么,将F应用于所有三个表达式表示它是一个单调递增的函数,因此F(z)< U < F(z + d z),向外扩展右侧以找到F(z)+ f(z)d z,并且由于U是均匀的,因此如所承诺的,该概率仅为f(z)d z。
You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”
So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.
Primer: sunflower spirals on the unit disk
To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.(Note 1.)
The algorithm for even spacing on a disk is,
from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp
num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5
r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices
pp.scatter(r*cos(theta), r*sin(theta))
pp.show()
and it produces results that look like (n=100 and n=1000):
Spacing the points radially
The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? (Note 2.)
Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.
Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?
First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
Then calculate the CDF’s inverse function F-1(z).
You will find that Z = F-1(U) is distributed according to the target density. (Note 3).
Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r2. So the inverse function is F-1(u) = u1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().
Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.
Now doing the sunflower on a sphere
The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn’t enter into this because we’re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I’ll use mathematicians’ coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.
Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 – 2 u), but we sample uniformly instead of randomly, so we instead use φk = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:
from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp
num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5
phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices
x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);
pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()
Again for n=100 and n=1000 the results look like:
Further research
I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R2 sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.
Notes
Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:
def continued_fraction(r):
while r != 0:
n = floor(r)
yield n
r = 1/(r - n)
Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.
For folks who are not so familiar with NumPy — all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars — those apply to all components in parallel.
The proof is simple once you know that this is the result. If you ask what’s the probability that z < Z < z + dz, this is the same as asking what’s the probability that z < F-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.
This is known as packing points on a sphere, and there is no (known) general, perfect solution. However, there are plenty of imperfect solutions. The three most popular seem to be:
Create a simulation. Treat each point as an electron constrained to a sphere, then run a simulation for a certain number of steps. The electrons’ repulsion will naturally tend the system to a more stable state, where the points are about as far away from each other as they can get.
Hypercube rejection. This fancy-sounding method is actually really simple: you uniformly choose points (much more than n of them) inside of the cube surrounding the sphere, then reject the points outside of the sphere. Treat the remaining points as vectors, and normalize them. These are your “samples” – choose n of them using some method (randomly, greedy, etc).
Spiral approximations. You trace a spiral around a sphere, and evenly-distribute the points around the spiral. Because of the mathematics involved, these are more complicated to understand than the simulation, but much faster (and probably involving less code). The most popular seems to be by Saff, et al.
A lot more information about this problem can be found here
What you are looking for is called a spherical covering. The spherical covering problem is very hard and solutions are unknown except for small numbers of points. One thing that is known for sure is that given n points on a sphere, there always exist two points of distance d = (4-csc^2(\pi n/6(n-2)))^(1/2) or closer.
If you want a probabilistic method for generating points uniformly distributed on a sphere, it’s easy: generate points in space uniformly by Gaussian distribution (it’s built into Java, not hard to find the code for other languages). So in 3-dimensional space, you need something like
Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };
Then project the point onto the sphere by normalizing its distance from the origin
The Gaussian distribution in n dimensions is spherically symmetric so the projection onto the sphere is uniform.
Of course, there’s no guarantee that the distance between any two points in a collection of uniformly generated points will be bounded below, so you can use rejection to enforce any such conditions that you might have: probably it’s best to generate the whole collection and then reject the whole collection if necessary. (Or use “early rejection” to reject the whole collection you’ve generated so far; just don’t keep some points and drop others.) You can use the formula for d given above, minus some slack, to determine the min distance between points below which you will reject a set of points. You’ll have to calculate n choose 2 distances, and the probability of rejection will depend on the slack; it’s hard to say how, so run a simulation to get a feel for the relevant statistics.
from math import cos, sin, pi, sqrt
defGetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):""" each point you get will be of form 'x, y, z'; in cartesian coordinates
eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0
------------
converted from: http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere )
"""
dlong = pi*(3.0-sqrt(5.0))# ~2.39996323
dz =2.0/numberOfPoints
long =0.0
z =1.0- dz/2.0
ptsOnSphere =[]for k in range(0, numberOfPoints):
r = sqrt(1.0-z*z)
ptNew =(cos(long)*r, sin(long)*r, z)
ptsOnSphere.append( ptNew )
z = z - dz
long = long + dlong
return ptsOnSphere
if __name__ =='__main__':
ptsOnSphere =GetPointsEquiAngularlyDistancedOnSphere(80)#toggle True/False to print themif(True):for pt in ptsOnSphere:print( pt)#toggle True/False to plot themif(True):from numpy import*import pylab as p
import mpl_toolkits.mplot3d.axes3d as p3
fig=p.figure()
ax = p3.Axes3D(fig)
x_s=[];y_s=[]; z_s=[]for pt in ptsOnSphere:
x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])
ax.scatter3D( array( x_s), array( y_s), array( z_s))
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
p.show()#end
This answer is based on the same ‘theory’ that is outlined well by this answer
I’m adding this answer as:
— None of the other options fit the ‘uniformity’ need ‘spot-on’ (or not obviously-clearly so). (Noting to get the planet like distribution looking behavior particurally wanted in the original ask, you just reject from the finite list of the k uniformly created points at random (random wrt the index count in the k items back).)
–The closest other impl forced you to decide the ‘N’ by ‘angular axis’, vs. just ‘one value of N’ across both angular axis values ( which at low counts of N is very tricky to know what may, or may not matter (e.g. you want ‘5’ points — have fun ) )
–Furthermore, it’s very hard to ‘grok’ how to differentiate between the other options without any imagery, so here’s what this option looks like (below), and the ready-to-run implementation that goes with it.
from math import cos, sin, pi, sqrt
def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
""" each point you get will be of form 'x, y, z'; in cartesian coordinates
eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0
------------
converted from: http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere )
"""
dlong = pi*(3.0-sqrt(5.0)) # ~2.39996323
dz = 2.0/numberOfPoints
long = 0.0
z = 1.0 - dz/2.0
ptsOnSphere =[]
for k in range( 0, numberOfPoints):
r = sqrt(1.0-z*z)
ptNew = (cos(long)*r, sin(long)*r, z)
ptsOnSphere.append( ptNew )
z = z - dz
long = long + dlong
return ptsOnSphere
if __name__ == '__main__':
ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)
#toggle True/False to print them
if( True ):
for pt in ptsOnSphere: print( pt)
#toggle True/False to plot them
if(True):
from numpy import *
import pylab as p
import mpl_toolkits.mplot3d.axes3d as p3
fig=p.figure()
ax = p3.Axes3D(fig)
x_s=[];y_s=[]; z_s=[]
for pt in ptsOnSphere:
x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])
ax.scatter3D( array( x_s), array( y_s), array( z_s) )
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
p.show()
#end
tested at low counts (N in 2, 5, 7, 13, etc) and seems to work ‘nice’
回答 6
尝试:
function sphere ( N:float,k:int):Vector3{
var inc =Mathf.PI *(3-Mathf.Sqrt(5));
var off =2/ N;
var y = k * off -1+(off /2);
var r =Mathf.Sqrt(1- y*y);
var phi = k * inc;returnVector3((Mathf.Cos(phi)*r), y,Mathf.Sin(phi)*r);};
function sphere ( N:float,k:int):Vector3 {
var inc = Mathf.PI * (3 - Mathf.Sqrt(5));
var off = 2 / N;
var y = k * off - 1 + (off / 2);
var r = Mathf.Sqrt(1 - y*y);
var phi = k * inc;
return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r);
};
The above function should run in loop with N loop total and k loop current iteration.
It is based on a sunflower seeds pattern, except the sunflower seeds are curved around into a half dome, and again into a sphere.
It’s probably overkill, but maybe after looking at it you’ll realize some of it’s other nice properties are interesting to you. It’s way more than just a function that outputs a point cloud.
I landed here trying to find it again; the name “healpix” doesn’t exactly evoke spheres…
回答 8
仅需少量点就可以运行模拟:
from random import random,randint
r =10
n =20
best_closest_d =0
best_points =[]
points =[(r,0,0)for i in range(n)]for simulation in range(10000):
x = random()*r
y = random()*r
z = r-(x**2+y**2)**0.5if randint(0,1):
x =-x
if randint(0,1):
y =-y
if randint(0,1):
z =-z
closest_dist =(2*r)**2
closest_index =Nonefor i in range(n):for j in range(n):if i==j:continue
p1,p2 = points[i],points[j]
x1,y1,z1 = p1
x2,y2,z2 = p2
d =(x1-x2)**2+(y1-y2)**2+(z1-z2)**2if d < closest_dist:
closest_dist = d
closest_index = i
if simulation %100==0:print simulation,closest_dist
if closest_dist > best_closest_d:
best_closest_d = closest_dist
best_points = points[:]
points[closest_index]=(x,y,z)print best_points
>>> best_points
[(9.921692138442777,-9.930808529773849,4.037839326088124),(5.141893371460546,1.7274947332807744,-4.575674650522637),(-4.917695758662436,-1.090127967097737,-4.9629263893193745),(3.6164803265540666,7.004158551438312,-2.1172868271109184),(-9.550655088997003,-9.580386054762917,3.5277052594769422),(-0.062238110294250415,6.803105171979587,3.1966101417463655),(-9.600996012203195,9.488067284474834,-3.498242301168819),(-8.601522086624803,4.519484132245867,-0.2834204048792728),(-1.1198210500791472,-2.2916581379035694,7.44937337008726),(7.981831370440529,8.539378431788634,1.6889099589074377),(0.513546008372332,-2.974333486904779,-6.981657873262494),(-4.13615438946178,-6.707488383678717,2.1197605651446807),(2.2859494919024326,-8.14336582650039,1.5418694699275672),(-7.241410895247996,9.907335206038226,2.271647103735541),(-9.433349952523232,-7.999106443463781,-2.3682575660694347),(3.704772125650199,1.0526567864085812,6.148581714099761),(-3.5710511242327048,5.512552040316693,-3.4318468250897647),(-7.483466337225052,-1.506434920354559,2.36641535124918),(7.73363824231576,-8.460241422163824,-1.4623228616326003),(10,0,0)]
Take the two largest factors of your N, if N==20 then the two largest factors are {5,4}, or, more generally {a,b}. Calculate
dlat = 180/(a+1)
dlong = 360/(b+1})
Put your first point at {90-dlat/2,(dlong/2)-180}, your second at {90-dlat/2,(3*dlong/2)-180}, your 3rd at {90-dlat/2,(5*dlong/2)-180}, until you’ve tripped round the world once, by which time you’ve got to about {75,150} when you go next to {90-3*dlat/2,(dlong/2)-180}.
Obviously I’m working this in degrees on the surface of the spherical earth, with the usual conventions for translating +/- to N/S or E/W. And obviously this gives you a completely non-random distribution, but it is uniform and the points are not bunched together.
To add some degree of randomness, you could generate 2 normally-distributed (with mean 0 and std dev of {dlat/3, dlong/3} as appropriate) and add them to your uniformly distributed points.
Your language typically has a uniform random number primitive. For example in python you can use random.random() to return a number in the range [0,1). You can multiply this number by k to get a random number in the range [0,k). Thus in python, uniform([0,2pi)) would mean random.random()*2*math.pi.
Proof
Now we can’t assign θ uniformly, otherwise we’d get clumping at the poles. We wish to assign probabilities proportional to the surface area of the spherical wedge (the θ in this diagram is actually φ):
An angular displacement dφ at the equator will result in a displacement of dφ*r. What will that displacement be at an arbitrary azimuth θ? Well, the radius from the z-axis is r*sin(θ), so the arclength of that “latitude” intersecting the wedge is dφ * r*sin(θ). Thus we calculate the cumulative distribution of the area to sample from it, by integrating the area of the slice from the south pole to the north pole.
OR… to place 20 points, compute the centers of the icosahedronal faces. For 12 points, find the vertices of the icosahedron. For 30 points, the mid point of the edges of the icosahedron. you can do the same thing with the tetrahedron, cube, dodecahedron and octahedrons: one set of points is on the vertices, another on the center of the face and another on the center of the edges. They cannot be mixed, however.
@robert king It’s a really nice solution but has some sloppy bugs in it. I know it helped me a lot though, so never mind the sloppiness. :)
Here is a cleaned up version….
from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2
lat_dist = lambda lat, R=1.0: R*(1-sin(lat))
#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
if -halfpi > lat or lat > halfpi:
raise ValueError("lat must be between -halfpi and halfpi")
return 2 * pi * R ** 2 * (1-sin(lat))
sphere_lonarea = lambda lon, R=1.0: \
4 * pi * R ** 2 * lon / twopi
#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
# = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
(sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi
def test_sphere(n_lats=10, n_lons=19, radius=540.0):
total_area = 0.0
for i_lons in range(n_lons):
lon0 = twopi * float(i_lons) / n_lons
lon1 = twopi * float(i_lons+1) / n_lons
for i_lats in range(n_lats):
lat0 = asin(2 * float(i_lats) / n_lats - 1)
lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
print("{:} {:}: {:9.4f} to {:9.4f}, {:9.4f} to {:9.4f} => area {:10.4f}"
.format(i_lats, i_lons
, degrees(lat0), degrees(lat1)
, degrees(lon0), degrees(lon1)
, area))
total_area += area
print("total_area = {:10.4f} (difference of {:10.4f})"
.format(total_area, abs(total_area) - sphere_area(radius)))
test_sphere()
回答 14
这行得通,而且非常简单。您想要的点数:
private function moveTweets():void {
var newScale:Number=Scale(meshes.length,50,500,6,2);
trace("new scale:"+newScale);
var l:Number=this.meshes.length;
var tweetMeshInstance:TweetMesh;
var destx:Number;
var desty:Number;
var destz:Number;for(var i:Number=0;i<this.meshes.length;i++){
tweetMeshInstance=meshes[i];
var phi:Number=Math.acos(-1+(2* i )/ l );
var theta:Number=Math.sqrt( l *Math.PI )* phi;
tweetMeshInstance.origX =(sphereRadius+5)*Math.cos( theta )*Math.sin( phi );
tweetMeshInstance.origY=(sphereRadius+5)*Math.sin( theta )*Math.sin( phi );
tweetMeshInstance.origZ =(sphereRadius+5)*Math.cos( phi );
destx=sphereRadius *Math.cos( theta )*Math.sin( phi );
desty=sphereRadius *Math.sin( theta )*Math.sin( phi );
destz=sphereRadius *Math.cos( phi );
tweetMeshInstance.lookAt(new Vector3D());TweenMax.to(tweetMeshInstance,1,{scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});}}
private function onLookAtTween(theMesh:TweetMesh):void {
theMesh.lookAt(new Vector3D());}
Does some standard Python module contain a function to compute modular multiplicative inverse of a number, i.e. a number y = invmod(x, p) such that x*y == 1 (mod p)? Google doesn’t seem to give any good hints on this.
Of course, one can come up with home-brewed 10-liner of extended Euclidean algorithm, but why reinvent the wheel.
For example, Java’s BigInteger has modInverse method. Doesn’t Python have something similar?
def egcd(a, b):if a ==0:return(b,0,1)else:
g, y, x = egcd(b % a, a)return(g, x -(b // a)* y, y)def modinv(a, m):
g, x, y = egcd(a, m)if g !=1:raiseException('modular inverse does not exist')else:return x % m
Maybe someone will find this useful (from wikibooks):
def egcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def modinv(a, m):
g, x, y = egcd(a, m)
if g != 1:
raise Exception('modular inverse does not exist')
else:
return x % m
You might also want to look at the gmpy module. It is an interface between Python and the GMP multiple-precision library. gmpy provides an invert function that does exactly what you need:
As noted by @hyh , the gmpy.invert() returns 0 if the inverse does not exist. That matches the behavior of GMP’s mpz_invert() function. gmpy.divm(a, b, m) provides a general solution to a=bx (mod m).
>>> gmpy.divm(1, 1234567, 1000000007)
mpz(989145189)
>>> gmpy.divm(1, 0, 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 9)
mpz(7)
divm() will return a solution when gcd(b,m) == 1 and raises an exception when the multiplicative inverse does not exist.
Disclaimer: I’m the current maintainer of the gmpy library.
Updated answer 2
gmpy2 now properly raises an exception when the inverse does not exists:
>>> import gmpy2
>>> gmpy2.invert(0,5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: invert() no inverse exists
Sympy, a python module for symbolic mathematics, has a built-in modular inverse function if you don’t want to implement your own (or if you’re using Sympy already):
from sympy import mod_inverse
mod_inverse(11, 35) # returns 16
mod_inverse(15, 35) # raises ValueError: 'inverse of 15 (mod 35) does not exist'
# a is the number you want the inverse for# b is the modulusdef mod_inverse(a, b):
r =-1
B = b
A = a
eq_set =[]
full_set =[]
mod_set =[]#euclid's algorithmwhile r!=1and r!=0:
r = b%a
q = b//a
eq_set =[r, b, a, q*-1]
b = a
a = r
full_set.append(eq_set)for i in range(0,4):
mod_set.append(full_set[-1][i])
mod_set.insert(2,1)
counter =0#extended euclid's algorithmfor i in range(1, len(full_set)):if counter%2==0:
mod_set[2]= full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
mod_set[3]= full_set[-1*(i+1)][1]elif counter%2!=0:
mod_set[4]= full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
mod_set[1]= full_set[-1*(i+1)][1]
counter +=1if mod_set[3]== B:return mod_set[2]%B
return mod_set[4]%B
Here is my code, it might be sloppy but it seems to work for me anyway.
# a is the number you want the inverse for
# b is the modulus
def mod_inverse(a, b):
r = -1
B = b
A = a
eq_set = []
full_set = []
mod_set = []
#euclid's algorithm
while r!=1 and r!=0:
r = b%a
q = b//a
eq_set = [r, b, a, q*-1]
b = a
a = r
full_set.append(eq_set)
for i in range(0, 4):
mod_set.append(full_set[-1][i])
mod_set.insert(2, 1)
counter = 0
#extended euclid's algorithm
for i in range(1, len(full_set)):
if counter%2 == 0:
mod_set[2] = full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
mod_set[3] = full_set[-1*(i+1)][1]
elif counter%2 != 0:
mod_set[4] = full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
mod_set[1] = full_set[-1*(i+1)][1]
counter += 1
if mod_set[3] == B:
return mod_set[2]%B
return mod_set[4]%B
The code above will not run in python3 and is less efficient compared to the GCD variants. However, this code is very transparent. It triggered me to create a more compact version:
def imod(a, n):
c = 1
while (c % a > 0):
c += n
return c // a
回答 8
这是一个简洁的1-liner,无需使用任何外部库即可执行此操作。
# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).# In particular, if a,b are relatively prime, returns the inverse of a modulo b.def invmod(a,b):return0if a==0else1if b%a==0else b - invmod(b%a,a)*b//a
Here is a concise 1-liner that does it, without using any external libraries.
# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).
# In particular, if a,b are relatively prime, returns the inverse of a modulo b.
def invmod(a,b): return 0 if a==0 else 1 if b%a==0 else b - invmod(b%a,a)*b//a
Note that this is really just egcd, streamlined to return only the single coefficient of interest.
回答 9
为了弄清楚模块化乘法逆,我建议使用扩展欧几里得算法,如下所示:
def multiplicative_inverse(a, b):
origA = a
X =0
prevX =1
Y =1
prevY =0while b !=0:
temp = b
quotient = a/b
b = a%b
a = temp
temp = X
a = prevX - quotient * X
prevX = temp
temp = Y
Y = prevY - quotient * Y
prevY = temp
return origA + prevY
To figure out the modular multiplicative inverse I recommend using the Extended Euclidean Algorithm like this:
def multiplicative_inverse(a, b):
origA = a
X = 0
prevX = 1
Y = 1
prevY = 0
while b != 0:
temp = b
quotient = a/b
b = a%b
a = temp
temp = X
a = prevX - quotient * X
prevX = temp
temp = Y
Y = prevY - quotient * Y
prevY = temp
return origA + prevY
回答 10
我从该线程尝试了不同的解决方案,最后我使用了一个:
def egcd(a, b):
lastremainder, remainder = abs(a), abs(b)
x, lastx, y, lasty =0,1,1,0while remainder:
lastremainder,(quotient, remainder)= remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx *(-1if a <0else1), lasty *(-1if b <0else1)def modinv(a, m):
g, x, y = self.egcd(a, m)if g !=1:raiseValueError('modinv for {} does not exist'.format(a))return x % m
I try different solutions from this thread and in the end I use this one:
def egcd(a, b):
lastremainder, remainder = abs(a), abs(b)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if a < 0 else 1), lasty * (-1 if b < 0 else 1)
def modinv(a, m):
g, x, y = self.egcd(a, m)
if g != 1:
raise ValueError('modinv for {} does not exist'.format(a))
return x % m
Well, I don’t have a function in python but I have a function in C which you can easily convert to python, in the below c function extended euclidian algorithm is used to calculate inverse mod.
int imod(int a,int n){
int c,i=1;
while(1){
c = n * i + 1;
if(c%a==0){
c = c/a;
break;
}
i++;
}
return c;}
Python Function
def imod(a,n):
i=1
while True:
c = n * i + 1;
if(c%a==0):
c = c/a
break;
i = i+1
return c
def invmod(a, n):
b, c =1,0while n:
q, r = divmod(a, n)
a, b, c, n = n, c, b - q*c, r
# at this point a is the gcd of the original inputsif a ==1:return b
raiseValueError("Not invertible")
def invmod(a, n):
b, c = 1, 0
while n:
q, r = divmod(a, n)
a, b, c, n = n, c, b - q*c, r
# at this point a is the gcd of the original inputs
if a == 1:
return b
raise ValueError("Not invertible")
according to the comment above this code, it can return small negative values, so you could potentially check if negative and add n when negative before returning b.
First thing that cames to mind is to go through all possible words (starting with first letter) and find the longest word possible, continue from position=word_position+len(word)
P.S.
We have a list of all possible words.
Word “cupboard” can be “cup” and “board”, select longest.
Language: python, but main thing is the algorithm itself.
from math import log# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words))))for i,k in enumerate(words))
maxword = max(len(x)for x in words)def infer_spaces(s):"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""# Find the best match for the i first characters, assuming cost has# been built for the i-1 first characters.# Returns a pair (match_cost, match_length).def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))return min((c + wordcost.get(s[i-k-1:i],9e999), k+1)for k,c in candidates)# Build the cost array.
cost =[0]for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)# Backtrack to recover the minimal-cost string.
out =[]
i = len(s)while i>0:
c,k = best_match(i)assert c == cost[i]
out.append(s[i-k:i])
i -= kreturn" ".join(reversed(out))
您可以使用
s ='thumbgreenappleactiveassignmentweeklymetaphor'print(infer_spaces(s))
A naive algorithm won’t give good results when applied to real-world data. Here is a 20-line algorithm that exploits relative word frequency to give accurate results for real-word text.
(If you want an answer to your original question which does not use word frequency, you need to refine what exactly is meant by “longest word”: is it better to have a 20-letter word and ten 3-letter words, or is it better to have five 10-letter words? Once you settle on a precise definition, you just have to change the line defining wordcost to reflect the intended meaning.)
The idea
The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf’s law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.
Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it’s easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.
The code
from math import log
# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
return " ".join(reversed(out))
which you can use with
s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))
After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.
After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.
As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.
Optimization
The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.
If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.
The only differences are minor. This returns a list rather than a str, it works in python3, it includes the word list and properly splits even if there are non-alpha chars (like underscores, dashes, etc).
def find_words(instring, prefix ='', words =None):ifnot instring:return[]if words isNone:
words = set()with open('/usr/share/dict/words')as f:for line in f:
words.add(line.strip())if(not prefix)and(instring in words):return[instring]
prefix, suffix = prefix + instring[0], instring[1:]
solutions =[]# Case 1: prefix in solutionif prefix in words:try:
solutions.append([prefix]+ find_words(suffix,'', words))exceptValueError:pass# Case 2: prefix not in solutiontry:
solutions.append(find_words(suffix, prefix, words))exceptValueError:passif solutions:return sorted(solutions,
key =lambda solution:[len(word)for word in solution],
reverse =True)[0]else:raiseValueError('no solution')print(find_words('tableapplechairtablecupboard'))print(find_words('tableprechaun', words = set(['tab','table','leprechaun'])))
def find_words(instring, prefix = '', words = None):
if not instring:
return []
if words is None:
words = set()
with open('/usr/share/dict/words') as f:
for line in f:
words.add(line.strip())
if (not prefix) and (instring in words):
return [instring]
prefix, suffix = prefix + instring[0], instring[1:]
solutions = []
# Case 1: prefix in solution
if prefix in words:
try:
solutions.append([prefix] + find_words(suffix, '', words))
except ValueError:
pass
# Case 2: prefix not in solution
try:
solutions.append(find_words(suffix, prefix, words))
except ValueError:
pass
if solutions:
return sorted(solutions,
key = lambda solution: [len(word) for word in solution],
reverse = True)[0]
else:
raise ValueError('no solution')
print(find_words('tableapplechairtablecupboard'))
print(find_words('tableprechaun', words = set(['tab', 'table', 'leprechaun'])))
Using a triedata structure, which holds the list of possible words, it would not be too complicated to do the following:
Advance pointer (in the concatenated string)
Lookup and store the corresponding node in the trie
If the trie node has children (e.g. there are longer words), go to 1.
If the node reached has no children, a longest word match happened; add the word (stored in the node or just concatenated during trie traversal) to the result list, reset the pointer in the trie (or reset the reference), and start over
words = set()with open('/usr/share/dict/words')as f:for line in f:
words.add(line.strip())
solutions ={}def find_words(instring):# First check if instring is in the dictionnaryif instring in words:return[instring]# No... But maybe it's a result we already computedif instring in solutions:return solutions[instring]# Nope. Try to split the string at all position to recursively search for results
best_solution =Nonefor i in range(1, len(instring)-1):
part1 = find_words(instring[:i])
part2 = find_words(instring[i:])# Both parts MUST have a solutionif part1 isNoneor part2 isNone:continue
solution = part1 + part2
# Is the solution found "better" than the previous one?if best_solution isNoneor len(solution)< len(best_solution):
best_solution = solution
# Remember (memoize) this solution to avoid having to recompute it
solutions[instring]= best_solution
return best_solution
在我的3GHz机器上,这大约需要5秒:
result = find_words("thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearenodelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetaphorapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquerywhetherthewordisreasonablesowhatsthefastestwayofextractionthxalot")assert(result isnotNone)print' '.join(result)
Unutbu’s solution was quite close but I find the code difficult to read, and it didn’t yield the expected result. Generic Human’s solution has the drawback that it needs word frequencies. Not appropriate for all use case.
It tries to minimize the number of words E.g. find_words('cupboard') will return ['cupboard'] rather than ['cup', 'board'] (assuming that cupboard, cup and board are in the dictionnary)
The optimal solution is not unique, the implementation below returns a solution. find_words('charactersin') could return ['characters', 'in'] or maybe it will return ['character', 'sin'] (as seen below). You could quite easily modify the algorithm to return all optimal solutions.
In this implementation solutions are memoized so that it runs in a reasonable time.
The code:
words = set()
with open('/usr/share/dict/words') as f:
for line in f:
words.add(line.strip())
solutions = {}
def find_words(instring):
# First check if instring is in the dictionnary
if instring in words:
return [instring]
# No... But maybe it's a result we already computed
if instring in solutions:
return solutions[instring]
# Nope. Try to split the string at all position to recursively search for results
best_solution = None
for i in range(1, len(instring) - 1):
part1 = find_words(instring[:i])
part2 = find_words(instring[i:])
# Both parts MUST have a solution
if part1 is None or part2 is None:
continue
solution = part1 + part2
# Is the solution found "better" than the previous one?
if best_solution is None or len(solution) < len(best_solution):
best_solution = solution
# Remember (memoize) this solution to avoid having to recompute it
solutions[instring] = best_solution
return best_solution
This will take about about 5sec on my 3GHz machine:
result = find_words("thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearenodelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetaphorapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquerywhetherthewordisreasonablesowhatsthefastestwayofextractionthxalot")
assert(result is not None)
print ' '.join(result)
the reis masses of text information of peoples comments which is parsed from h t m l but there are no delimited character sin them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple e t c in the string i also have a large dictionary to query whether the word is reasonable so whats the fastest way of extraction t h x a lot
但是,当您具有bigram模型时,可以将p(’sit down’)视为bigram vs p(’sitdown’),并且前者获胜。基本上,如果您不使用双字母组,它将把您拆分的单词的概率视为独立,而不是这种情况,某些单词更有可能一个接一个地出现。不幸的是,这些词在很多情况下经常被粘在一起,使拆分器感到困惑。
import re, string, random, glob, operator, heapq
from collections import defaultdict
from math import log10
def memo(f):"Memoize function f."
table ={}def fmemo(*args):if args notin table:
table[args]= f(*args)return table[args]
fmemo.memo = table
return fmemo
def test(verbose=None):"""Run some tests, taken from the chapter.
Since the hillclimbing algorithm is randomized, some tests may fail."""import doctest
print'Running tests...'
doctest.testfile('ngrams-test.txt', verbose=verbose)################ Word Segmentation (p. 223)@memodef segment(text):"Return a list of words that is the best segmentation of text."ifnot text:return[]
candidates =([first]+segment(rem)for first,rem in splits(text))return max(candidates, key=Pwords)def splits(text, L=20):"Return a list of all possible (first, rem) pairs, len(first)<=L."return[(text[:i+1], text[i+1:])for i in range(min(len(text), L))]defPwords(words):"The Naive Bayes probability of a sequence of words."return product(Pw(w)for w in words)#### Support functions (p. 224)def product(nums):"Return the product of a sequence of numbers."return reduce(operator.mul, nums,1)classPdist(dict):"A probability distribution estimated from counts in datafile."def __init__(self, data=[], N=None, missingfn=None):for key,count in data:
self[key]= self.get(key,0)+ int(count)
self.N = float(N or sum(self.itervalues()))
self.missingfn = missingfn or(lambda k, N:1./N)def __call__(self, key):if key in self:return self[key]/self.N
else:return self.missingfn(key, self.N)def datafile(name, sep='\t'):"Read key,value pairs from file."for line in file(name):yield line.split(sep)def avoid_long_words(key, N):"Estimate the probability of an unknown word."return10./(N *10**len(key))
N =1024908267229## Number of tokensPw=Pdist(datafile('count_1w.txt'), N, avoid_long_words)#### segment2: second version, with bigram counts, (p. 226-227)def cPw(word, prev):"Conditional probability of word, given previous word."try:return P2w[prev +' '+ word]/float(Pw[prev])exceptKeyError:returnPw(word)
P2w =Pdist(datafile('count_2w.txt'), N)@memodef segment2(text, prev='<S>'):"Return (log P(words), words), where words is the best segmentation."ifnot text:return0.0,[]
candidates =[combine(log10(cPw(first, prev)), first, segment2(rem, first))for first,rem in splits(text)]return max(candidates)def combine(Pfirst, first,(Prem, rem)):"Combine first and rem results into one (probability, words) pair."returnPfirst+Prem,[first]+rem
Before I paste his code, let me expand on why Norvig’s method is more accurate (although a little slower and longer in terms of code).
1) The data is a bit better – both in terms of size and in terms of precision (he uses a word count rather than a simple ranking)
2) More importantly, it’s the logic behind n-grams that really makes the approach so accurate.
The example he provides in his book is the problem of splitting a string ‘sitdown’. Now a non-bigram method of string split would consider p(‘sit’) * p (‘down’), and if this less than the p(‘sitdown’) – which will be the case quite often – it will NOT split it, but we’d want it to (most of the time).
However when you have the bigram model you could value p(‘sit down’) as a bigram vs p(‘sitdown’) and the former wins. Basically, if you don’t use bigrams, it treats the probability of the words you’re splitting as independent, which is not the case, some words are more likely to appear one after the other. Unfortunately those are also the words that are often stuck together in a lot of instances and confuses the splitter.
Here’s the link to the data (it’s data for 3 separate problems and segmentation is only one. Please read the chapter for details): http://norvig.com/ngrams/
These links have been up a while, but I’ll copy paste the segmentation part of the code here anyway
import re, string, random, glob, operator, heapq
from collections import defaultdict
from math import log10
def memo(f):
"Memoize function f."
table = {}
def fmemo(*args):
if args not in table:
table[args] = f(*args)
return table[args]
fmemo.memo = table
return fmemo
def test(verbose=None):
"""Run some tests, taken from the chapter.
Since the hillclimbing algorithm is randomized, some tests may fail."""
import doctest
print 'Running tests...'
doctest.testfile('ngrams-test.txt', verbose=verbose)
################ Word Segmentation (p. 223)
@memo
def segment(text):
"Return a list of words that is the best segmentation of text."
if not text: return []
candidates = ([first]+segment(rem) for first,rem in splits(text))
return max(candidates, key=Pwords)
def splits(text, L=20):
"Return a list of all possible (first, rem) pairs, len(first)<=L."
return [(text[:i+1], text[i+1:])
for i in range(min(len(text), L))]
def Pwords(words):
"The Naive Bayes probability of a sequence of words."
return product(Pw(w) for w in words)
#### Support functions (p. 224)
def product(nums):
"Return the product of a sequence of numbers."
return reduce(operator.mul, nums, 1)
class Pdist(dict):
"A probability distribution estimated from counts in datafile."
def __init__(self, data=[], N=None, missingfn=None):
for key,count in data:
self[key] = self.get(key, 0) + int(count)
self.N = float(N or sum(self.itervalues()))
self.missingfn = missingfn or (lambda k, N: 1./N)
def __call__(self, key):
if key in self: return self[key]/self.N
else: return self.missingfn(key, self.N)
def datafile(name, sep='\t'):
"Read key,value pairs from file."
for line in file(name):
yield line.split(sep)
def avoid_long_words(key, N):
"Estimate the probability of an unknown word."
return 10./(N * 10**len(key))
N = 1024908267229 ## Number of tokens
Pw = Pdist(datafile('count_1w.txt'), N, avoid_long_words)
#### segment2: second version, with bigram counts, (p. 226-227)
def cPw(word, prev):
"Conditional probability of word, given previous word."
try:
return P2w[prev + ' ' + word]/float(Pw[prev])
except KeyError:
return Pw(word)
P2w = Pdist(datafile('count_2w.txt'), N)
@memo
def segment2(text, prev='<S>'):
"Return (log P(words), words), where words is the best segmentation."
if not text: return 0.0, []
candidates = [combine(log10(cPw(first, prev)), first, segment2(rem, first))
for first,rem in splits(text)]
return max(candidates)
def combine(Pfirst, first, (Prem, rem)):
"Combine first and rem results into one (probability, words) pair."
return Pfirst+Prem, [first]+rem
var fs = require("fs");
var splitRegex = new RegExp("[^a-zA-Z0-9']+","g");
var maxWordLen =0;
var wordCost ={};
fs.readFile("./wordninja_words.txt",'utf8', function(err, data){if(err){
throw err;}
var words = data.split('\n');
words.forEach(function(word, index){
wordCost[word]=Math.log((index +1)*Math.log(words.length));})
words.forEach(function(word){if(word.length > maxWordLen)
maxWordLen = word.length;});
console.log(maxWordLen)
splitRegex = new RegExp("[^a-zA-Z0-9']+","g");
console.log(split(process.argv[2]));});
function split(s){
var list =[];
s.split(splitRegex).forEach(function(sub){
_split(sub).forEach(function(word){
list.push(word);})})return list;}
module.exports = split;
function _split(s){
var cost =[0];
function best_match(i){
var candidates = cost.slice(Math.max(0, i - maxWordLen), i).reverse();
var minPair =[Number.MAX_SAFE_INTEGER,0];
candidates.forEach(function(c, k){if(wordCost[s.substring(i - k -1, i).toLowerCase()]){
var ccost = c + wordCost[s.substring(i - k -1, i).toLowerCase()];}else{
var ccost =Number.MAX_SAFE_INTEGER;}if(ccost < minPair[0]){
minPair =[ccost, k +1];}})return minPair;}for(var i =1; i < s.length +1; i++){
cost.push(best_match(i)[0]);}
var out =[];
i = s.length;while(i >0){
var c = best_match(i)[0];
var k = best_match(i)[1];if(c == cost[i])
console.log("Alert: "+ c);
var newToken = true;if(s.slice(i - k, i)!="'"){if(out.length >0){if(out[-1]=="'s"||(Number.isInteger(s[i -1])&&Number.isInteger(out[-1][0]))){
out[-1]= s.slice(i - k, i)+ out[-1];
newToken = false;}}}if(newToken){
out.push(s.slice(i - k, i))}
i -= k
}return out.reverse();}
Here is the accepted answer translated to JavaScript (requires node.js, and the file “wordninja_words.txt” from https://github.com/keredson/wordninja):
var fs = require("fs");
var splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
var maxWordLen = 0;
var wordCost = {};
fs.readFile("./wordninja_words.txt", 'utf8', function(err, data) {
if (err) {
throw err;
}
var words = data.split('\n');
words.forEach(function(word, index) {
wordCost[word] = Math.log((index + 1) * Math.log(words.length));
})
words.forEach(function(word) {
if (word.length > maxWordLen)
maxWordLen = word.length;
});
console.log(maxWordLen)
splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
console.log(split(process.argv[2]));
});
function split(s) {
var list = [];
s.split(splitRegex).forEach(function(sub) {
_split(sub).forEach(function(word) {
list.push(word);
})
})
return list;
}
module.exports = split;
function _split(s) {
var cost = [0];
function best_match(i) {
var candidates = cost.slice(Math.max(0, i - maxWordLen), i).reverse();
var minPair = [Number.MAX_SAFE_INTEGER, 0];
candidates.forEach(function(c, k) {
if (wordCost[s.substring(i - k - 1, i).toLowerCase()]) {
var ccost = c + wordCost[s.substring(i - k - 1, i).toLowerCase()];
} else {
var ccost = Number.MAX_SAFE_INTEGER;
}
if (ccost < minPair[0]) {
minPair = [ccost, k + 1];
}
})
return minPair;
}
for (var i = 1; i < s.length + 1; i++) {
cost.push(best_match(i)[0]);
}
var out = [];
i = s.length;
while (i > 0) {
var c = best_match(i)[0];
var k = best_match(i)[1];
if (c == cost[i])
console.log("Alert: " + c);
var newToken = true;
if (s.slice(i - k, i) != "'") {
if (out.length > 0) {
if (out[-1] == "'s" || (Number.isInteger(s[i - 1]) && Number.isInteger(out[-1][0]))) {
out[-1] = s.slice(i - k, i) + out[-1];
newToken = false;
}
}
}
if (newToken) {
out.push(s.slice(i - k, i))
}
i -= k
}
return out.reverse();
}
If you precompile the wordlist into a DFA (which will be very slow), then the time it takes to match an input will be proportional to the length of the string (in fact, only a little slower than just iterating over the string).
This is effectively a more general version of the trie algorithm which was mentioned earlier. I only mention it for completeless — as of yet, there’s no DFA implementation you can just use. RE2 would work, but I don’t know if the Python bindings let you tune how large you allow a DFA to be before it just throws away the compiled DFA data and does NFA search.
It seems like fairly mundane backtracking will do. Start at the beggining of the string. Scan right until you have a word. Then, call the function on the rest of the string. Function returns “false” if it scans all the way to the right without recognizing a word. Otherwise, returns the word it found and the list of words returned by the recursive call.
Example: “tableapple”. Finds “tab”, then “leap”, but no word in “ple”. No other word in “leapple”. Finds “table”, then “app”. “le” not a word, so tries apple, recognizes, returns.
To get longest possible, keep going, only emitting (rather than returning) correct solutions; then, choose the optimal one by any criterion you choose (maxmax, minmax, average, etc.)
classNode:def __init__(self, is_word=False):
self.children ={}
self.is_word = is_word
classTrieDictionary:def __init__(self, words=tuple()):
self.root =Node()for word in words:
self.add(word)def add(self, word):
node = self.root
for c in word:
node = node.children.setdefault(c,Node())
node.is_word =Truedef lookup(self, word, from_node=None):
node = self.root if from_node isNoneelse from_node
for c in word:try:
node = node.children[c]exceptKeyError:returnNonereturn node
def using_trie_longest_word_heuristic(s):
node =None
possible_indexes =[]# O(1) short-circuit if whole string is a word, doesn't go against longest-word winsif s in dictionary:return[ s ]for i in range(len(s)):# traverse the trie, char-wise to determine intermediate words
node = trie_dictionary.lookup(s[i], from_node=node)# no more words start this wayif node isNone:# iterate words we have encountered from biggest to smallestfor possible in possible_indexes[::-1]:# recurse to attempt to solve the remaining sub-string
end_of_phrase = using_trie_longest_word_heuristic(s[possible+1:])# if we have a solution, return this word + our solutionif end_of_phrase:return[ s[:possible+1]]+ end_of_phrase
# unsolvablebreak# if this is a leaf, append the index to the possible words listelif node.is_word:
possible_indexes.append(i)# empty string OR unsolvable case return[]
'peanutbutter'-not a word, go charwise
'p'-in trie, use this node
'e'-in trie, use this node
'a'-in trie and edge, store potential word and use this node
'n'-in trie, use this node
'u'-in trie, use this node
't'-in trie and edge, store potential word and use this node
'b'-notin trie from`peanut` vector
'butter'- remainder of longest is a word
Expanding on @miku’s suggestion to use a Trie, an append-only Trie is relatively straight-forward to implement in python:
class Node:
def __init__(self, is_word=False):
self.children = {}
self.is_word = is_word
class TrieDictionary:
def __init__(self, words=tuple()):
self.root = Node()
for word in words:
self.add(word)
def add(self, word):
node = self.root
for c in word:
node = node.children.setdefault(c, Node())
node.is_word = True
def lookup(self, word, from_node=None):
node = self.root if from_node is None else from_node
for c in word:
try:
node = node.children[c]
except KeyError:
return None
return node
We can then build a Trie-based dictionary from a set of words:
Which will produce a tree that looks like this (* indicates beginning or end of a word):
* -> a*
\\\
\\\-> p -> e -> a*
\\ \-> n -> u -> t*
\\
\\-> b -> u -> t*
\\ \-> t*
\\ \-> e*
\\ \-> r*
\
\-> n -> u -> t*
We can incorporate this into a solution by combining it with a heuristic about how to choose words. For example we can prefer longer words over shorter words:
def using_trie_longest_word_heuristic(s):
node = None
possible_indexes = []
# O(1) short-circuit if whole string is a word, doesn't go against longest-word wins
if s in dictionary:
return [ s ]
for i in range(len(s)):
# traverse the trie, char-wise to determine intermediate words
node = trie_dictionary.lookup(s[i], from_node=node)
# no more words start this way
if node is None:
# iterate words we have encountered from biggest to smallest
for possible in possible_indexes[::-1]:
# recurse to attempt to solve the remaining sub-string
end_of_phrase = using_trie_longest_word_heuristic(s[possible+1:])
# if we have a solution, return this word + our solution
if end_of_phrase:
return [ s[:possible+1] ] + end_of_phrase
# unsolvable
break
# if this is a leaf, append the index to the possible words list
elif node.is_word:
possible_indexes.append(i)
# empty string OR unsolvable case
return []
Because we maintain our position in the Trie as we search for longer and longer words, we traverse the trie at most once per possible solution (rather than 2 times for peanut: pea, peanut). The final short-circuit saves us from walking char-wise through the string in the worst-case.
The final result is only a handful of inspections:
'peanutbutter' - not a word, go charwise
'p' - in trie, use this node
'e' - in trie, use this node
'a' - in trie and edge, store potential word and use this node
'n' - in trie, use this node
'u' - in trie, use this node
't' - in trie and edge, store potential word and use this node
'b' - not in trie from `peanut` vector
'butter' - remainder of longest is a word
A benefit of this solution is in the fact that you know very quickly if longer words exist with a given prefix, which spares the need to exhaustively test sequence combinations against a dictionary. It also makes getting to an unsolvable answer comparatively cheap to other implementations.
The downsides of this solution are a large memory footprint for the trie and the cost of building the trie up-front.
Using list comprehension to iterate over the list to locate the word and how many times it appears.
string = "tableapplechairtablecupboard"
def split_string(string, word_list):
return ("".join([(item + " ")*string.count(item.lower()) for item in word_list if item.lower() in string])).strip()
The function returns a string output of words in order of the list table table apple chair cupboard
A small contribution of the same in Java from my side.
The public method splitContiguousWords could be embedded with the other 2 methods in the class having ninja_words.txt in same directory(or modified as per the choice of coder). And the method splitContiguousWords could be used for the purpose.
public List<String> splitContiguousWords(String sentence) {
String splitRegex = "[^a-zA-Z0-9']+";
Map<String, Number> wordCost = new HashMap<>();
List<String> dictionaryWords = IOUtils.linesFromFile("ninja_words.txt", StandardCharsets.UTF_8.name());
double naturalLogDictionaryWordsCount = Math.log(dictionaryWords.size());
long wordIdx = 0;
for (String word : dictionaryWords) {
wordCost.put(word, Math.log(++wordIdx * naturalLogDictionaryWordsCount));
}
int maxWordLength = Collections.max(dictionaryWords, Comparator.comparing(String::length)).length();
List<String> splitWords = new ArrayList<>();
for (String partSentence : sentence.split(splitRegex)) {
splitWords.add(split(partSentence, wordCost, maxWordLength));
}
log.info("Split word for the sentence: {}", splitWords);
return splitWords;
}
private String split(String partSentence, Map<String, Number> wordCost, int maxWordLength) {
List<Pair<Number, Number>> cost = new ArrayList<>();
cost.add(new Pair<>(Integer.valueOf(0), Integer.valueOf(0)));
for (int index = 1; index < partSentence.length() + 1; index++) {
cost.add(bestMatch(partSentence, cost, index, wordCost, maxWordLength));
}
int idx = partSentence.length();
List<String> output = new ArrayList<>();
while (idx > 0) {
Pair<Number, Number> candidate = bestMatch(partSentence, cost, idx, wordCost, maxWordLength);
Number candidateCost = candidate.getKey();
Number candidateIndexValue = candidate.getValue();
if (candidateCost.doubleValue() != cost.get(idx).getKey().doubleValue()) {
throw new RuntimeException("Candidate cost unmatched; This should not be the case!");
}
boolean newToken = true;
String token = partSentence.substring(idx - candidateIndexValue.intValue(), idx);
if (token != "\'" && output.size() > 0) {
String lastWord = output.get(output.size() - 1);
if (lastWord.equalsIgnoreCase("\'s") ||
(Character.isDigit(partSentence.charAt(idx - 1)) && Character.isDigit(lastWord.charAt(0)))) {
output.set(output.size() - 1, token + lastWord);
newToken = false;
}
}
if (newToken) {
output.add(token);
}
idx -= candidateIndexValue.intValue();
}
return String.join(" ", Lists.reverse(output));
}
private Pair<Number, Number> bestMatch(String partSentence, List<Pair<Number, Number>> cost, int index,
Map<String, Number> wordCost, int maxWordLength) {
List<Pair<Number, Number>> candidates = Lists.reverse(cost.subList(Math.max(0, index - maxWordLength), index));
int enumerateIdx = 0;
Pair<Number, Number> minPair = new Pair<>(Integer.MAX_VALUE, Integer.valueOf(enumerateIdx));
for (Pair<Number, Number> pair : candidates) {
++enumerateIdx;
String subsequence = partSentence.substring(index - enumerateIdx, index).toLowerCase();
Number minCost = Integer.MAX_VALUE;
if (wordCost.containsKey(subsequence)) {
minCost = pair.getKey().doubleValue() + wordCost.get(subsequence).doubleValue();
}
if (minCost.doubleValue() < minPair.getKey().doubleValue()) {
minPair = new Pair<>(minCost.doubleValue(), enumerateIdx);
}
}
return minPair;
}
回答 14
这会有所帮助
from wordsegment import load, segment
load()
segment('providesfortheresponsibilitiesofperson')
>>> s ='she sells sea shells by the sea shore'>>># Use hashlib>>>import hashlib
>>> int(hashlib.sha1(s).hexdigest(),16)%(10**8)58097614L>>># Use hash()>>> abs(hash(s))%(10**8)82148974
Yes, you can use the built-in hashlib modules or the built-in hash function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:
>>> s = 'she sells sea shells by the sea shore'
>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L
>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974
Raymond’s answer is great for python2 (though, you don’t need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you’ll need to make sure you are passing an encoded string. These days, in most circumstances, it’s probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:
If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:
var crypto = require('crypto');
var s ='she sells sea shells by the sea shore';
console.log(BigInt('0x'+ crypto.createHash('sha1').update(s).digest('hex'))%(10n**8n));
I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.
var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));
def divisorGenerator(n):
for i in xrange(1,n/2+1):
if n%i == 0: yield i
yield n
The result I’d like to get is similar to this one, but I’d like a smarter algorithm (this one it’s too much slow and dumb :-)
I can find prime factors and their multiplicity fast enough.
I’ve an generator that generates factor in this way:
(factor1, multiplicity1)
(factor2, multiplicity2)
(factor3, multiplicity3)
and so on…
i.e. the output of
for i in factorGenerator(100):
print i
is:
(2, 2)
(5, 2)
I don’t know how much is this useful for what I want to do (I coded it for other problems), anyway I’d like a smarter way to make
for i in divisorGen(100):
print i
output this:
1
2
4
5
10
20
25
50
100
UPDATE: Many thanks to Greg Hewgill and his “smart way” :)
Calculating all divisors of 100000000 took 0.01s with his way against the 39s that the dumb way took on my machine, very cool :D
UPDATE 2: Stop saying this is a duplicate of this post. Calculating the number of divisor of a given number doesn’t need to calculate all the divisors. It’s a different problem, if you think it’s not then look for “Divisor function” on wikipedia. Read the questions and the answer before posting, if you do not understand what is the topic just don’t add not useful and already given answers.
回答 0
给定您的factorGenerator功能,这里divisorGen应该可以工作:
def divisorGen(n):
factors = list(factorGenerator(n))
nfactors = len(factors)
f =[0]* nfactors
whileTrue:yield reduce(lambda x, y: x*y,[factors[x][0]**f[x]for x in range(nfactors)],1)
i =0whileTrue:
f[i]+=1if f[i]<= factors[i][1]:break
f[i]=0
i +=1if i >= nfactors:return
Given your factorGenerator function, here is a divisorGen that should work:
def divisorGen(n):
factors = list(factorGenerator(n))
nfactors = len(factors)
f = [0] * nfactors
while True:
yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
The overall efficiency of this algorithm will depend entirely on the efficiency of the factorGenerator.
import math
def divisorGenerator(n):
large_divisors =[]for i in xrange(1, int(math.sqrt(n)+1)):if n % i ==0:yield i
if i*i != n:
large_divisors.append(n / i)for divisor in reversed(large_divisors):yield divisor
print list(divisorGenerator(100))
To expand on what Shimi has said, you should only be running your loop from 1 to the square root of n. Then to find the pair, do n / i, and this will cover the whole problem space.
As was also noted, this is a NP, or ‘difficult’ problem. Exhaustive search, the way you are doing it, is about as good as it gets for guaranteed answers. This fact is used by encryption algorithms and the like to help secure them. If someone were to solve this problem, most if not all of our current ‘secure’ communication would be rendered insecure.
Python code:
import math
def divisorGenerator(n):
large_divisors = []
for i in xrange(1, int(math.sqrt(n) + 1)):
if n % i == 0:
yield i
if i*i != n:
large_divisors.append(n / i)
for divisor in reversed(large_divisors):
yield divisor
print list(divisorGenerator(100))
Which should output a list like:
[1, 2, 4, 5, 10, 20, 25, 50, 100]
回答 2
尽管已经有很多解决方案,但我确实必须发布此内容:)
这是:
可读的
短
自包含,可复制并粘贴
快速(在有很多主要因素和因数的情况下,比公认的解决方案快10倍以上)
符合python3,python2和pypy
码:
def divisors(n):# get factors and their counts
factors ={}
nn = n
i =2while i*i <= nn:while nn % i ==0:
factors[i]= factors.get(i,0)+1
nn //= i
i +=1if nn >1:
factors[nn]= factors.get(nn,0)+1
primes = list(factors.keys())# generates factors from primes[k:] subsetdef generate(k):if k == len(primes):yield1else:
rest = generate(k+1)
prime = primes[k]for factor in rest:
prime_to_i =1# prime_to_i iterates prime**i values, i being all possible exponentsfor _ in range(factors[prime]+1):yield factor * prime_to_i
prime_to_i *= prime
# in python3, `yield from generate(0)` would also workfor factor in generate(0):yield factor
Although there are already many solutions to this, I really have to post this :)
This one is:
readable
short
self contained, copy & paste ready
quick (in cases with a lot of prime factors and divisors, > 10 times faster than the accepted solution)
python3, python2 and pypy compliant
Code:
def divisors(n):
# get factors and their counts
factors = {}
nn = n
i = 2
while i*i <= nn:
while nn % i == 0:
factors[i] = factors.get(i, 0) + 1
nn //= i
i += 1
if nn > 1:
factors[nn] = factors.get(nn, 0) + 1
primes = list(factors.keys())
# generates factors from primes[k:] subset
def generate(k):
if k == len(primes):
yield 1
else:
rest = generate(k+1)
prime = primes[k]
for factor in rest:
prime_to_i = 1
# prime_to_i iterates prime**i values, i being all possible exponents
for _ in range(factors[prime] + 1):
yield factor * prime_to_i
prime_to_i *= prime
# in python3, `yield from generate(0)` would also work
for factor in generate(0):
yield factor
I think you can stop at math.sqrt(n) instead of n/2.
I will give you example so that you can understand it easily. Now the sqrt(28) is 5.29 so ceil(5.29) will be 6. So I if I will stop at 6 then I will can get all the divisors. How?
First see the code and then see image:
import math
def divisors(n):
divs = [1]
for i in xrange(2,int(math.sqrt(n))+1):
if n%i == 0:
divs.extend([i,n/i])
divs.extend([n])
return list(set(divs))
Now, See the image below:
Lets say I have already added 1 to my divisors list and I start with i=2 so
So at the end of all the iterations as I have added the quotient and the divisor to my list all the divisors of 28 are populated.
from math import sqrt
################################################################# cartesian product of lists ################################################################################################def appendEs2Sequences(sequences,es):
result=[]ifnot sequences:for e in es:
result.append([e])else:for e in es:
result+=[seq+[e]for seq in sequences]return result
def cartesianproduct(lists):"""
given a list of lists,
returns all the possible combinations taking one element from each list
The list does not have to be of equal length
"""return reduce(appendEs2Sequences,lists,[])################################################################# prime factors of a natural ################################################################################################def primefactors(n):'''lists prime factors, from greatest to smallest'''
i =2while i<=sqrt(n):if n%i==0:
l = primefactors(n/i)
l.append(i)return l
i+=1return[n]# n is prime################################################################# factorization of a natural ################################################################################################def factorGenerator(n):
p = primefactors(n)
factors={}for p1 in p:try:
factors[p1]+=1exceptKeyError:
factors[p1]=1return factors
def divisors(n):
factors = factorGenerator(n)
divisors=[]
listexponents=[map(lambda x:k**x,range(0,factors[k]+1))for k in factors.keys()]
listfactors=cartesianproduct(listexponents)for f in listfactors:
divisors.append(reduce(lambda x, y: x*y, f,1))
divisors.sort()return divisors
print divisors(60668796879)
I like Greg solution, but I wish it was more python like.
I feel it would be faster and more readable;
so after some time of coding I came out with this.
The first two functions are needed to make the cartesian product of lists.
And can be reused whnever this problem arises.
By the way, I had to program this myself, if anyone knows of a standard solution for this problem, please feel free to contact me.
“Factorgenerator” now returns a dictionary. And then the dictionary is fed into “divisors”, who uses it to generate first a list of lists, where each list is the list of the factors of the form p^n with p prime.
Then we make the cartesian product of those lists, and we finally use Greg’ solution to generate the divisor.
We sort them, and return them.
I tested it and it seem to be a bit faster than the previous version. I tested it as part of a bigger program, so I can’t really say how much is it faster though.
Pietro Speroni (pietrosperoni dot it)
from math import sqrt
##############################################################
### cartesian product of lists ##################################
##############################################################
def appendEs2Sequences(sequences,es):
result=[]
if not sequences:
for e in es:
result.append([e])
else:
for e in es:
result+=[seq+[e] for seq in sequences]
return result
def cartesianproduct(lists):
"""
given a list of lists,
returns all the possible combinations taking one element from each list
The list does not have to be of equal length
"""
return reduce(appendEs2Sequences,lists,[])
##############################################################
### prime factors of a natural ##################################
##############################################################
def primefactors(n):
'''lists prime factors, from greatest to smallest'''
i = 2
while i<=sqrt(n):
if n%i==0:
l = primefactors(n/i)
l.append(i)
return l
i+=1
return [n] # n is prime
##############################################################
### factorization of a natural ##################################
##############################################################
def factorGenerator(n):
p = primefactors(n)
factors={}
for p1 in p:
try:
factors[p1]+=1
except KeyError:
factors[p1]=1
return factors
def divisors(n):
factors = factorGenerator(n)
divisors=[]
listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
listfactors=cartesianproduct(listexponents)
for f in listfactors:
divisors.append(reduce(lambda x, y: x*y, f, 1))
divisors.sort()
return divisors
print divisors(60668796879)
P.S.
it is the first time I am posting to stackoverflow.
I am looking forward for any feedback.
回答 5
这是在纯Python 3.6中对10 ** 16左右的数字进行处理的一种智能,快速的方法,
from itertools import compress
def primes(n):""" Returns a list of primes < n for n > 2 """
sieve = bytearray([True])*(n//2)for i in range(3,int(n**0.5)+1,2):if sieve[i//2]:
sieve[i*i//2::i]= bytearray((n-i*i-1)//(2*i)+1)return[2,*compress(range(3,n,2), sieve[1:])]def factorization(n):""" Returns a list of the prime factorization of n """
pf =[]for p in primeslist:if p*p > n :break
count =0whilenot n % p:
n //= p
count +=1if count >0: pf.append((p, count))if n >1: pf.append((n,1))return pf
def divisors(n):""" Returns an unsorted list of the divisors of n """
divs =[1]for p, e in factorization(n):
divs +=[x*p**k for k in range(1,e+1)for x in divs]return divs
n =600851475143
primeslist = primes(int(n**0.5)+1)print(divisors(n))
Here is a smart and fast way to do it for numbers up to and around 10**16 in pure Python 3.6,
from itertools import compress
def primes(n):
""" Returns a list of primes < n for n > 2 """
sieve = bytearray([True]) * (n//2)
for i in range(3,int(n**0.5)+1,2):
if sieve[i//2]:
sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
return [2,*compress(range(3,n,2), sieve[1:])]
def factorization(n):
""" Returns a list of the prime factorization of n """
pf = []
for p in primeslist:
if p*p > n : break
count = 0
while not n % p:
n //= p
count += 1
if count > 0: pf.append((p, count))
if n > 1: pf.append((n, 1))
return pf
def divisors(n):
""" Returns an unsorted list of the divisors of n """
divs = [1]
for p, e in factorization(n):
divs += [x*p**k for k in range(1,e+1) for x in divs]
return divs
n = 600851475143
primeslist = primes(int(n**0.5)+1)
print(divisors(n))
from itertools import product
import operator
def prod(ls):return reduce(operator.mul, ls,1)def powered(factors, powers):return prod(f**p for(f,p)in zip(factors, powers))def divisors(num):
pf = dict(prime_factors(num))
primes = pf.keys()#For each prime, possible exponents
exponents =[range(i+1)for i in pf.values()]return(powered(primes,es)for es in product(*exponents))
Adapted from CodeReview, here is a variant which works with num=1 !
from itertools import product
import operator
def prod(ls):
return reduce(operator.mul, ls, 1)
def powered(factors, powers):
return prod(f**p for (f,p) in zip(factors, powers))
def divisors(num) :
pf = dict(prime_factors(num))
primes = pf.keys()
#For each prime, possible exponents
exponents = [range(i+1) for i in pf.values()]
return (powered(primes,es) for es in product(*exponents))
回答 7
我将添加一个稍微修改过的Anivarth版本(因为我认为它是最Python的)以供将来参考。
from math import sqrt
def divisors(n):
divs ={1,n}for i in range(2,int(sqrt(n))+1):if n%i ==0:
divs.update((i,n//i))return divs
function divisors(n)
divs :=[1]for fact in factors(n)
temp :=[]for div in divs
if fact * div notin divs
append fact * div to temp
divs := divs + temp
return divs
Assuming that the factors function returns the factors of n (for instance, factors(60) returns the list [2, 2, 3, 5]), here is a function to compute the divisors of n:
function divisors(n)
divs := [1]
for fact in factors(n)
temp := []
for div in divs
if fact * div not in divs
append fact * div to temp
divs := divs + temp
return divs
import math as m
def findfac(n):
faclist =[1]for i in range(2, int(m.sqrt(n)+2)):if n%i ==0:if i notin faclist:
faclist.append(i)if n/i notin faclist:
faclist.append(n/i)return facts
Here’s my solution. It seems to be dumb but works well…and I was trying to find all proper divisors so the loop started from i = 2.
import math as m
def findfac(n):
faclist = [1]
for i in range(2, int(m.sqrt(n) + 2)):
if n%i == 0:
if i not in faclist:
faclist.append(i)
if n/i not in faclist:
faclist.append(n/i)
return facts
回答 11
如果您只在乎使用列表推导,对您而言别无其他!
from itertools import combinations
from functools import reduce
def get_devisors(n):
f =[f for f,e in list(factorGenerator(n))for i in range(e)]
fc =[x for l in range(len(f)+1)for x in combinations(f, l)]
devisors =[1if c==()else reduce((lambda x, y: x * y), c)for c in set(fc)]return sorted(devisors)
If you only care about using list comprehensions and nothing else matters to you!
from itertools import combinations
from functools import reduce
def get_devisors(n):
f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
return sorted(devisors)
回答 12
如果您的PC拥有大量内存,那么使用numpy可以使单个行足够快:
N =10000000; tst = np.arange(1, N); tst[np.mod(N, tst)==0]Out:
array([1,2,4,5,8,10,16,20,25,32,40,50,64,80,100,125,128,160,200,250,320,400,500,625,640,800,1000,1250,1600,2000,2500,3125,3200,4000,5000,6250,8000,10000,12500,15625,16000,20000,25000,31250,40000,50000,62500,78125,80000,100000,125000,156250,200000,250000,312500,400000,500000,625000,1000000,1250000,2000000,2500000,5000000])
Not the most efficient, but straight forward and concise:
if len(x) > len(set(x)):
pass # do something
Probably won’t make much of a difference for short lists.
回答 1
这里有两个班轮,它们也会提前退出:
>>>def allUnique(x):... seen = set()...returnnot any(i in seen or seen.add(i)for i in x)...>>> allUnique("ABCDEF")True>>> allUnique("ABACDEF")False
如果x的元素不可散列,那么您将不得不使用以下列表seen:
>>>def allUnique(x):... seen = list()...returnnot any(i in seen or seen.append(i)for i in x)...>>> allUnique([list("ABC"), list("DEF")])True>>> allUnique([list("ABC"), list("DEF"), list("ABC")])False
>>> def allUnique(x):
... seen = set()
... return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False
If the elements of x aren’t hashable, then you’ll have to resort to using a list for seen:
>>> def allUnique(x):
... seen = list()
... return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
回答 2
提前退出的解决方案可能是
def unique_values(g):
s = set()for x in g:if x in s:returnFalse
s.add(x)returnTrue
def f5(seq, idfun=None):# order preservingif idfun isNone:def idfun(x):return x
seen ={}
result =[]for item in seq:
marker = idfun(item)# in old Python versions:# if seen.has_key(marker)# but in new ones:if marker in seen:continue
seen[marker]=1
result.append(item)return result
You can use Yan’s syntax (len(x) > len(set(x))), but instead of set(x), define a function:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
and do len(x) > len(f5(x)). This will be fast and is also order preserving.
Sure! The code’s here, starting with function islt and proceeding for QUITE a while;-). As Chris’s comment suggests, it’s C code. You’ll also want to read this text file for a textual explanation, results, etc etc.
If you prefer reading Java code than C code, you could look at Joshua Bloch’s implementation of timsort in and for Java (Joshua’s also the guy who implemented, in 1997, the modified mergesort that’s still used in Java, and one can hope that Java will eventually switch to his recent port of timsort).
Some explanation of the Java port of timsort is here, the diff is here (with pointers to all needed files), the key file is here — FWIW, while I’m a better C programmer than Java programmer, in this case I find Joshua’s Java code more readable overall than Tim’s C code;-).
In early python-versions, the sort function implemented a modified version of quicksort.
However, it was deemed unstable and as of 2.3 they switched to using an adaptive mergesort algorithm.
Here is my simple recursive implementation of binary search tree.
#!/usr/bin/python
class Node:
def __init__(self, val):
self.l = None
self.r = None
self.v = val
class Tree:
def __init__(self):
self.root = None
def getRoot(self):
return self.root
def add(self, val):
if self.root is None:
self.root = Node(val)
else:
self._add(val, self.root)
def _add(self, val, node):
if val < node.v:
if node.l is not None:
self._add(val, node.l)
else:
node.l = Node(val)
else:
if node.r is not None:
self._add(val, node.r)
else:
node.r = Node(val)
def find(self, val):
if self.root is not None:
return self._find(val, self.root)
else:
return None
def _find(self, val, node):
if val == node.v:
return node
elif (val < node.v and node.l is not None):
self._find(val, node.l)
elif (val > node.v and node.r is not None):
self._find(val, node.r)
def deleteTree(self):
# garbage collector will do this for us.
self.root = None
def printTree(self):
if self.root is not None:
self._printTree(self.root)
def _printTree(self, node):
if node is not None:
self._printTree(node.l)
print(str(node.v) + ' ')
self._printTree(node.r)
# 3
# 0 4
# 2 8
tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print(tree.find(3).v)
print(tree.find(10))
tree.deleteTree()
tree.printTree()
回答 1
# simple binary tree# in this implementation, a node is inserted between an existing node and the rootclassBinaryTree():def __init__(self,rootid):
self.left =None
self.right =None
self.rootid = rootid
def getLeftChild(self):return self.left
def getRightChild(self):return self.right
def setNodeValue(self,value):
self.rootid = value
def getNodeValue(self):return self.rootid
def insertRight(self,newNode):if self.right ==None:
self.right =BinaryTree(newNode)else:
tree =BinaryTree(newNode)
tree.right = self.right
self.right = tree
def insertLeft(self,newNode):if self.left ==None:
self.left =BinaryTree(newNode)else:
tree =BinaryTree(newNode)
tree.left = self.left
self.left = tree
def printTree(tree):if tree !=None:
printTree(tree.getLeftChild())print(tree.getNodeValue())
printTree(tree.getRightChild())# test treedef testTree():
myTree =BinaryTree("Maud")
myTree.insertLeft("Bob")
myTree.insertRight("Tony")
myTree.insertRight("Steven")
printTree(myTree)
[What you need for interviews] A Node class is the sufficient data structure to represent a binary tree.
(While other answers are mostly correct, they are not required for a binary tree: no need to extend object class, no need to be a BST, no need to import deque).
class Node:
def __init__(self, value = None):
self.left = None
self.right = None
self.value = value
def _add(node, v):
new =[v,[],[]]if node:
left, right = node[1:]ifnot left:
left.extend(new)elifnot right:
right.extend(new)else:
_add(left, v)else:
node.extend(new)def binary_tree(s):
root =[]for e in s:
_add(root, e)return root
def traverse(n, order):if n:
v = n[0]if order =='pre':yield v
for left in traverse(n[1], order):yield left
if order =='in':yield v
for right in traverse(n[2], order):yield right
if order =='post':yield v
从可迭代构造树:
>>> tree = binary_tree('A B C D E'.split())>>>print tree
['A',['B',['D',[],[]],['E',[],[]]],['C',[],[]]]
A very quick ‘n dirty way of implementing a binary tree using lists.
Not the most efficient, nor does it handle nil values all too well.
But it’s very transparent (at least to me):
def _add(node, v):
new = [v, [], []]
if node:
left, right = node[1:]
if not left:
left.extend(new)
elif not right:
right.extend(new)
else:
_add(left, v)
else:
node.extend(new)
def binary_tree(s):
root = []
for e in s:
_add(root, e)
return root
def traverse(n, order):
if n:
v = n[0]
if order == 'pre':
yield v
for left in traverse(n[1], order):
yield left
if order == 'in':
yield v
for right in traverse(n[2], order):
yield right
if order == 'post':
yield v
Constructing a tree from an iterable:
>>> tree = binary_tree('A B C D E'.split())
>>> print tree
['A', ['B', ['D', [], []], ['E', [], []]], ['C', [], []]]
I can’t help but notice that most answers here are implementing a Binary Search Tree. Binary Search Tree != Binary Tree.
A Binary Search Tree has a very specific property: for any node X, X’s key is larger than the key of any descendent of its left child, and smaller than the key of any descendant of its right child.
A Binary Tree imposes no such restriction. A Binary Tree is simply a data structure with a ‘key’ element, and two children, say ‘left’ and ‘right’.
A Tree is an even more general case of a Binary Tree where each node can have an arbitrary number of children. Typically, each node has a ‘children’ element which is of type list/array.
Now, to answer the OP’s question, I am including a full implementation of a Binary Tree in Python. The underlying data structure storing each BinaryTreeNode is a dictionary, given it offers optimal O(1) lookups. I’ve also implemented depth-first and breadth-first traversals. These are very common operations performed on trees.
from collections import deque
class BinaryTreeNode:
def __init__(self, key, left=None, right=None):
self.key = key
self.left = left
self.right = right
def __repr__(self):
return "%s l: (%s) r: (%s)" % (self.key, self.left, self.right)
def __eq__(self, other):
if self.key == other.key and \
self.right == other.right and \
self.left == other.left:
return True
else:
return False
class BinaryTree:
def __init__(self, root_key=None):
# maps from BinaryTreeNode key to BinaryTreeNode instance.
# Thus, BinaryTreeNode keys must be unique.
self.nodes = {}
if root_key is not None:
# create a root BinaryTreeNode
self.root = BinaryTreeNode(root_key)
self.nodes[root_key] = self.root
def add(self, key, left_key=None, right_key=None):
if key not in self.nodes:
# BinaryTreeNode with given key does not exist, create it
self.nodes[key] = BinaryTreeNode(key)
# invariant: self.nodes[key] exists
# handle left child
if left_key is None:
self.nodes[key].left = None
else:
if left_key not in self.nodes:
self.nodes[left_key] = BinaryTreeNode(left_key)
# invariant: self.nodes[left_key] exists
self.nodes[key].left = self.nodes[left_key]
# handle right child
if right_key == None:
self.nodes[key].right = None
else:
if right_key not in self.nodes:
self.nodes[right_key] = BinaryTreeNode(right_key)
# invariant: self.nodes[right_key] exists
self.nodes[key].right = self.nodes[right_key]
def remove(self, key):
if key not in self.nodes:
raise ValueError('%s not in tree' % key)
# remove key from the list of nodes
del self.nodes[key]
# if node removed is left/right child, update parent node
for k in self.nodes:
if self.nodes[k].left and self.nodes[k].left.key == key:
self.nodes[k].left = None
if self.nodes[k].right and self.nodes[k].right.key == key:
self.nodes[k].right = None
return True
def _height(self, node):
if node is None:
return 0
else:
return 1 + max(self._height(node.left), self._height(node.right))
def height(self):
return self._height(self.root)
def size(self):
return len(self.nodes)
def __repr__(self):
return str(self.traverse_inorder(self.root))
def bfs(self, node):
if not node or node not in self.nodes:
return
reachable = []
q = deque()
# add starting node to queue
q.append(node)
while len(q):
visit = q.popleft()
# add currently visited BinaryTreeNode to list
reachable.append(visit)
# add left/right children as needed
if visit.left:
q.append(visit.left)
if visit.right:
q.append(visit.right)
return reachable
# visit left child, root, then right child
def traverse_inorder(self, node, reachable=None):
if not node or node.key not in self.nodes:
return
if reachable is None:
reachable = []
self.traverse_inorder(node.left, reachable)
reachable.append(node.key)
self.traverse_inorder(node.right, reachable)
return reachable
# visit left and right children, then root
# root of tree is always last to be visited
def traverse_postorder(self, node, reachable=None):
if not node or node.key not in self.nodes:
return
if reachable is None:
reachable = []
self.traverse_postorder(node.left, reachable)
self.traverse_postorder(node.right, reachable)
reachable.append(node.key)
return reachable
# visit root, left, then right children
# root is always visited first
def traverse_preorder(self, node, reachable=None):
if not node or node.key not in self.nodes:
return
if reachable is None:
reachable = []
reachable.append(node.key)
self.traverse_preorder(node.left, reachable)
self.traverse_preorder(node.right, reachable)
return reachable
回答 6
你不需要两节课
classTree:
val =None
left =None
right =Nonedef __init__(self, val):
self.val = val
def insert(self, val):if self.val isnotNone:if val < self.val:if self.left isnotNone:
self.left.insert(val)else:
self.left =Tree(val)elif val > self.val:if self.right isnotNone:
self.right.insert(val)else:
self.right =Tree(val)else:returnelse:
self.val = val
print("new node added")def showTree(self):if self.left isnotNone:
self.left.showTree()print(self.val, end =' ')if self.right isnotNone:
self.right.showTree()
class Tree:
val = None
left = None
right = None
def __init__(self, val):
self.val = val
def insert(self, val):
if self.val is not None:
if val < self.val:
if self.left is not None:
self.left.insert(val)
else:
self.left = Tree(val)
elif val > self.val:
if self.right is not None:
self.right.insert(val)
else:
self.right = Tree(val)
else:
return
else:
self.val = val
print("new node added")
def showTree(self):
if self.left is not None:
self.left.showTree()
print(self.val, end = ' ')
if self.right is not None:
self.right.showTree()
import collections as ct
def traverse(tree):"""Yield nodes from a tree via BFS."""
q = ct.deque()# 1
root = next(iter(tree))# 2
q.append(root)while q:
node = q.popleft()
children = filter(None, tree.get(node))for n in children:# 3
q.append(n)yield node
Each key-value pair is a unique node pointing to its children.
A list (or tuple) holds an ordered pair of left/right children.
With a dict having ordered insertion, assume the first entry is the root.
Common methods can be functions that mutate or traverse the dict (see find_all_paths()).
Tree-based functions often include the following common operations:
traverse: yield each node in a given order (usually left-to-right)
breadth-first search (BFS): traverse levels
depth-first search (DFS): traverse branches first (pre-/in-/post-order)
insert: add a node to the tree depending on the number of children
remove: remove a node depending on the number of children
update: merge missing nodes from one tree to the other
visit: yield the value of a traversed node
Try implementing all of these operations.
Here we demonstrate one of these functions – a BFS traversal:
Example
import collections as ct
def traverse(tree):
"""Yield nodes from a tree via BFS."""
q = ct.deque() # 1
root = next(iter(tree)) # 2
q.append(root)
while q:
node = q.popleft()
children = filter(None, tree.get(node))
for n in children: # 3
q.append(n)
yield node
Something great about traversals in general, we can easily alter the latter iterative approach to depth-first search (DFS) by simply replacing the queue with a stack (a.k.a LIFO Queue). This simply means we dequeue from the same side that we enqueue. DFS allows us to search each branch.
How? Since we are using a deque, we can emulate a stack by changing node = q.popleft() to node = q.pop() (right). The result is a right-favored, pre-ordered DFS: ['a', 'c', 'f', 'b', 'e', 'd'].
import random
class TreeNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
self.p = None
class BinaryTree:
def __init__(self):
self.root = None
def length(self):
return self.size
def inorder(self, node):
if node == None:
return None
else:
self.inorder(node.left)
print node.key,
self.inorder(node.right)
def search(self, k):
node = self.root
while node != None:
if node.key == k:
return node
if node.key > k:
node = node.left
else:
node = node.right
return None
def minimum(self, node):
x = None
while node.left != None:
x = node.left
node = node.left
return x
def maximum(self, node):
x = None
while node.right != None:
x = node.right
node = node.right
return x
def successor(self, node):
parent = None
if node.right != None:
return self.minimum(node.right)
parent = node.p
while parent != None and node == parent.right:
node = parent
parent = parent.p
return parent
def predecessor(self, node):
parent = None
if node.left != None:
return self.maximum(node.left)
parent = node.p
while parent != None and node == parent.left:
node = parent
parent = parent.p
return parent
def insert(self, k):
t = TreeNode(k)
parent = None
node = self.root
while node != None:
parent = node
if node.key > t.key:
node = node.left
else:
node = node.right
t.p = parent
if parent == None:
self.root = t
elif t.key < parent.key:
parent.left = t
else:
parent.right = t
return t
def delete(self, node):
if node.left == None:
self.transplant(node, node.right)
elif node.right == None:
self.transplant(node, node.left)
else:
succ = self.minimum(node.right)
if succ.p != node:
self.transplant(succ, succ.right)
succ.right = node.right
succ.right.p = succ
self.transplant(node, succ)
succ.left = node.left
succ.left.p = succ
def transplant(self, node, newnode):
if node.p == None:
self.root = newnode
elif node == node.p.left:
node.p.left = newnode
else:
node.p.right = newnode
if newnode != None:
newnode.p = node.p
回答 11
此实现支持插入,查找和删除操作,而不会破坏树的结构。这不是平衡树。
# Class for construct the nodes of the tree. (Subtrees)classNode:def __init__(self, key, parent_node =None):
self.left =None
self.right =None
self.key = key
if parent_node ==None:
self.parent = self
else:
self.parent = parent_node
# Class with the structure of the tree. # This Tree is not balanced.classTree:def __init__(self):
self.root =None# Insert a single elementdef insert(self, x):if(self.root ==None):
self.root =Node(x)else:
self._insert(x, self.root)def _insert(self, x, node):if(x < node.key):if(node.left ==None):
node.left =Node(x, node)else:
self._insert(x, node.left)else:if(node.right ==None):
node.right =Node(x, node)else:
self._insert(x, node.right)# Given a element, return a node in the tree with key x. def find(self, x):if(self.root ==None):returnNoneelse:return self._find(x, self.root)def _find(self, x, node):if(x == node.key):return node
elif(x < node.key):if(node.left ==None):returnNoneelse:return self._find(x, node.left)elif(x > node.key):if(node.right ==None):returnNoneelse:return self._find(x, node.right)# Given a node, return the node in the tree with the next largest element.def next(self, node):if node.right !=None:return self._left_descendant(node.right)else:return self._right_ancestor(node)def _left_descendant(self, node):if node.left ==None:return node
else:return self._left_descendant(node.left)def _right_ancestor(self, node):if node.key <= node.parent.key:return node.parent
else:return self._right_ancestor(node.parent)# Delete an element of the treedef delete(self, x):
node = self.find(x)if node ==None:print(x,"isn't in the tree")else:if node.right ==None:if node.left ==None:if node.key < node.parent.key:
node.parent.left =Nonedel node # Clean garbageelse:
node.parent.right =NonedelNode# Clean garbageelse:
node.key = node.left.key
node.left =Noneelse:
x = self.next(node)
node.key = x.key
x =None# tests
t =Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)
t.delete(8)
t.delete(5)
t.insert(9)
t.insert(1)
t.delete(2)
t.delete(100)# Remember: Find method return the node object. # To return a number use t.find(nº).key# But it will cause an error if the number is not in the tree.print(t.find(5))print(t.find(8))print(t.find(4))print(t.find(6))print(t.find(9))
This implementation supports insert, find and delete operations without destroy the structure of the tree. This is not a banlanced tree.
# Class for construct the nodes of the tree. (Subtrees)
class Node:
def __init__(self, key, parent_node = None):
self.left = None
self.right = None
self.key = key
if parent_node == None:
self.parent = self
else:
self.parent = parent_node
# Class with the structure of the tree.
# This Tree is not balanced.
class Tree:
def __init__(self):
self.root = None
# Insert a single element
def insert(self, x):
if(self.root == None):
self.root = Node(x)
else:
self._insert(x, self.root)
def _insert(self, x, node):
if(x < node.key):
if(node.left == None):
node.left = Node(x, node)
else:
self._insert(x, node.left)
else:
if(node.right == None):
node.right = Node(x, node)
else:
self._insert(x, node.right)
# Given a element, return a node in the tree with key x.
def find(self, x):
if(self.root == None):
return None
else:
return self._find(x, self.root)
def _find(self, x, node):
if(x == node.key):
return node
elif(x < node.key):
if(node.left == None):
return None
else:
return self._find(x, node.left)
elif(x > node.key):
if(node.right == None):
return None
else:
return self._find(x, node.right)
# Given a node, return the node in the tree with the next largest element.
def next(self, node):
if node.right != None:
return self._left_descendant(node.right)
else:
return self._right_ancestor(node)
def _left_descendant(self, node):
if node.left == None:
return node
else:
return self._left_descendant(node.left)
def _right_ancestor(self, node):
if node.key <= node.parent.key:
return node.parent
else:
return self._right_ancestor(node.parent)
# Delete an element of the tree
def delete(self, x):
node = self.find(x)
if node == None:
print(x, "isn't in the tree")
else:
if node.right == None:
if node.left == None:
if node.key < node.parent.key:
node.parent.left = None
del node # Clean garbage
else:
node.parent.right = None
del Node # Clean garbage
else:
node.key = node.left.key
node.left = None
else:
x = self.next(node)
node.key = x.key
x = None
# tests
t = Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)
t.delete(8)
t.delete(5)
t.insert(9)
t.insert(1)
t.delete(2)
t.delete(100)
# Remember: Find method return the node object.
# To return a number use t.find(nº).key
# But it will cause an error if the number is not in the tree.
print(t.find(5))
print(t.find(8))
print(t.find(4))
print(t.find(6))
print(t.find(9))
I know many good solutions have already been posted but I usually have a different approach for binary trees: going with some Node class and implementing it directly is more readable but when you have a lot of nodes it can become very greedy regarding memory, so I suggest adding one layer of complexity and storing the nodes in a python list, and then simulating a tree behavior using only the list.
You can still define a Node class to finally represent the nodes in the tree when needed, but keeping them in a simple form [value, left, right] in a list will use half the memory or less!
Here is a quick example of a Binary Search Tree class storing the nodes in an array. It provides basic fonctions such as add, remove, find…
"""
Basic Binary Search Tree class without recursion...
"""
__author__ = "@fbparis"
class Node(object):
__slots__ = "value", "parent", "left", "right"
def __init__(self, value, parent=None, left=None, right=None):
self.value = value
self.parent = parent
self.left = left
self.right = right
def __repr__(self):
return "<%s object at %s: parent=%s, left=%s, right=%s, value=%s>" % (self.__class__.__name__, hex(id(self)), self.parent, self.left, self.right, self.value)
class BinarySearchTree(object):
__slots__ = "_tree"
def __init__(self, *args):
self._tree = []
if args:
for x in args[0]:
self.add(x)
def __len__(self):
return len(self._tree)
def __repr__(self):
return "<%s object at %s with %d nodes>" % (self.__class__.__name__, hex(id(self)), len(self))
def __str__(self, nodes=None, level=0):
ret = ""
if nodes is None:
if len(self):
nodes = [0]
else:
nodes = []
for node in nodes:
if node is None:
continue
ret += "-" * level + " %s\n" % self._tree[node][0]
ret += self.__str__(self._tree[node][2:4], level + 1)
if level == 0:
ret = ret.strip()
return ret
def __contains__(self, value):
if len(self):
node_index = 0
while self._tree[node_index][0] != value:
if value < self._tree[node_index][0]:
node_index = self._tree[node_index][2]
else:
node_index = self._tree[node_index][3]
if node_index is None:
return False
return True
return False
def __eq__(self, other):
return self._tree == other._tree
def add(self, value):
if len(self):
node_index = 0
while self._tree[node_index][0] != value:
if value < self._tree[node_index][0]:
b = self._tree[node_index][2]
k = 2
else:
b = self._tree[node_index][3]
k = 3
if b is None:
self._tree[node_index][k] = len(self)
self._tree.append([value, node_index, None, None])
break
node_index = b
else:
self._tree.append([value, None, None, None])
def remove(self, value):
if len(self):
node_index = 0
while self._tree[node_index][0] != value:
if value < self._tree[node_index][0]:
node_index = self._tree[node_index][2]
else:
node_index = self._tree[node_index][3]
if node_index is None:
raise KeyError
if self._tree[node_index][2] is not None:
b, d = 2, 3
elif self._tree[node_index][3] is not None:
b, d = 3, 2
else:
i = node_index
b = None
if b is not None:
i = self._tree[node_index][b]
while self._tree[i][d] is not None:
i = self._tree[i][d]
p = self._tree[i][1]
b = self._tree[i][b]
if p == node_index:
self._tree[p][5-d] = b
else:
self._tree[p][d] = b
if b is not None:
self._tree[b][1] = p
self._tree[node_index][0] = self._tree[i][0]
else:
p = self._tree[i][1]
if p is not None:
if self._tree[p][2] == i:
self._tree[p][2] = None
else:
self._tree[p][3] = None
last = self._tree.pop()
n = len(self)
if i < n:
self._tree[i] = last[:]
if last[2] is not None:
self._tree[last[2]][1] = i
if last[3] is not None:
self._tree[last[3]][1] = i
if self._tree[last[1]][2] == n:
self._tree[last[1]][2] = i
else:
self._tree[last[1]][3] = i
else:
raise KeyError
def find(self, value):
if len(self):
node_index = 0
while self._tree[node_index][0] != value:
if value < self._tree[node_index][0]:
node_index = self._tree[node_index][2]
else:
node_index = self._tree[node_index][3]
if node_index is None:
return None
return Node(*self._tree[node_index])
return None
I’ve added a parent attribute so that you can remove any node and maintain the BST structure.
Sorry for the readability, especially for the “remove” function. Basically, when a node is removed, we pop the tree array and replace it with the last element (except if we wanted to remove the last node). To maintain the BST structure, the removed node is replaced with the max of its left children or the min of its right children and some operations have to be done in order to keep the indexes valid but it’s fast enough.
I used this technique for more advanced stuff to build some big words dictionaries with an internal radix trie and I was able to divide memory consumption by 7-8 (you can see an example here: https://gist.github.com/fbparis/b3ddd5673b603b42c880974b23db7cda)
'''
A binary search Tree
'''from __future__ import print_function
classNode:def __init__(self, label, parent):
self.label = label
self.left =None
self.right =None#Added in order to delete a node easier
self.parent = parent
def getLabel(self):return self.label
def setLabel(self, label):
self.label = label
def getLeft(self):return self.left
def setLeft(self, left):
self.left = left
def getRight(self):return self.right
def setRight(self, right):
self.right = right
def getParent(self):return self.parent
def setParent(self, parent):
self.parent = parent
classBinarySearchTree:def __init__(self):
self.root =Nonedef insert(self, label):# Create a new Node
new_node =Node(label,None)# If Tree is emptyif self.empty():
self.root = new_node
else:#If Tree is not empty
curr_node = self.root
#While we don't get to a leafwhile curr_node isnotNone:#We keep reference of the parent node
parent_node = curr_node
#If node label is less than current nodeif new_node.getLabel()< curr_node.getLabel():#We go left
curr_node = curr_node.getLeft()else:#Else we go right
curr_node = curr_node.getRight()#We insert the new node in a leafif new_node.getLabel()< parent_node.getLabel():
parent_node.setLeft(new_node)else:
parent_node.setRight(new_node)#Set parent to the new node
new_node.setParent(parent_node)def delete(self, label):if(not self.empty()):#Look for the node with that label
node = self.getNode(label)#If the node existsif(node isnotNone):#If it has no childrenif(node.getLeft()isNoneand node.getRight()isNone):
self.__reassignNodes(node,None)
node =None#Has only right childrenelif(node.getLeft()isNoneand node.getRight()isnotNone):
self.__reassignNodes(node, node.getRight())#Has only left childrenelif(node.getLeft()isnotNoneand node.getRight()isNone):
self.__reassignNodes(node, node.getLeft())#Has two childrenelse:#Gets the max value of the left branch
tmpNode = self.getMax(node.getLeft())#Deletes the tmpNode
self.delete(tmpNode.getLabel())#Assigns the value to the node to delete and keesp tree structure
node.setLabel(tmpNode.getLabel())def getNode(self, label):
curr_node =None#If the tree is not emptyif(not self.empty()):#Get tree root
curr_node = self.getRoot()#While we don't find the node we look for#I am using lazy evaluation here to avoid NoneType Attribute errorwhile curr_node isnotNoneand curr_node.getLabel()isnot label:#If node label is less than current nodeif label < curr_node.getLabel():#We go left
curr_node = curr_node.getLeft()else:#Else we go right
curr_node = curr_node.getRight()return curr_node
def getMax(self, root =None):if(root isnotNone):
curr_node = root
else:#We go deep on the right branch
curr_node = self.getRoot()if(not self.empty()):while(curr_node.getRight()isnotNone):
curr_node = curr_node.getRight()return curr_node
def getMin(self, root =None):if(root isnotNone):
curr_node = root
else:#We go deep on the left branch
curr_node = self.getRoot()if(not self.empty()):
curr_node = self.getRoot()while(curr_node.getLeft()isnotNone):
curr_node = curr_node.getLeft()return curr_node
def empty(self):if self.root isNone:returnTruereturnFalsedef __InOrderTraversal(self, curr_node):
nodeList =[]if curr_node isnotNone:
nodeList.insert(0, curr_node)
nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())return nodeList
def getRoot(self):return self.root
def __isRightChildren(self, node):if(node == node.getParent().getRight()):returnTruereturnFalsedef __reassignNodes(self, node, newChildren):if(newChildren isnotNone):
newChildren.setParent(node.getParent())if(node.getParent()isnotNone):#If it is the Right Childrenif(self.__isRightChildren(node)):
node.getParent().setRight(newChildren)else:#Else it is the left children
node.getParent().setLeft(newChildren)#This function traversal the tree. By default it returns an#In order traversal list. You can pass a function to traversal#The tree as needed by client codedef traversalTree(self, traversalFunction =None, root =None):if(traversalFunction isNone):#Returns a list of nodes in preOrder by defaultreturn self.__InOrderTraversal(self.root)else:#Returns a list of nodes in the order that the users wants toreturn traversalFunction(self.root)#Returns an string of all the nodes labels in the list #In Order Traversaldef __str__(self):
list = self.__InOrderTraversal(self.root)
str =""for x in list:
str = str +" "+ x.getLabel().__str__()return str
defInPreOrder(curr_node):
nodeList =[]if curr_node isnotNone:
nodeList = nodeList +InPreOrder(curr_node.getLeft())
nodeList.insert(0, curr_node.getLabel())
nodeList = nodeList +InPreOrder(curr_node.getRight())return nodeList
def testBinarySearchTree():
r'''
Example
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
'''
r'''
Example After Deletion
7
/ \
1 4
'''
t =BinarySearchTree()
t.insert(8)
t.insert(3)
t.insert(6)
t.insert(1)
t.insert(10)
t.insert(14)
t.insert(13)
t.insert(4)
t.insert(7)#Prints all the elements of the list in order traversalprint(t.__str__())if(t.getNode(6)isnotNone):print("The label 6 exists")else:print("The label 6 doesn't exist")if(t.getNode(-1)isnotNone):print("The label -1 exists")else:print("The label -1 doesn't exist")if(not t.empty()):print(("Max Value: ", t.getMax().getLabel()))print(("Min Value: ", t.getMin().getLabel()))
t.delete(13)
t.delete(10)
t.delete(8)
t.delete(3)
t.delete(6)
t.delete(14)#Gets all the elements of the tree In pre order#And it prints them
list = t.traversalTree(InPreOrder, t.root)for x in list:print(x)if __name__ =="__main__":
testBinarySearchTree()
A good implementation of binary search tree, taken from here:
'''
A binary search Tree
'''
from __future__ import print_function
class Node:
def __init__(self, label, parent):
self.label = label
self.left = None
self.right = None
#Added in order to delete a node easier
self.parent = parent
def getLabel(self):
return self.label
def setLabel(self, label):
self.label = label
def getLeft(self):
return self.left
def setLeft(self, left):
self.left = left
def getRight(self):
return self.right
def setRight(self, right):
self.right = right
def getParent(self):
return self.parent
def setParent(self, parent):
self.parent = parent
class BinarySearchTree:
def __init__(self):
self.root = None
def insert(self, label):
# Create a new Node
new_node = Node(label, None)
# If Tree is empty
if self.empty():
self.root = new_node
else:
#If Tree is not empty
curr_node = self.root
#While we don't get to a leaf
while curr_node is not None:
#We keep reference of the parent node
parent_node = curr_node
#If node label is less than current node
if new_node.getLabel() < curr_node.getLabel():
#We go left
curr_node = curr_node.getLeft()
else:
#Else we go right
curr_node = curr_node.getRight()
#We insert the new node in a leaf
if new_node.getLabel() < parent_node.getLabel():
parent_node.setLeft(new_node)
else:
parent_node.setRight(new_node)
#Set parent to the new node
new_node.setParent(parent_node)
def delete(self, label):
if (not self.empty()):
#Look for the node with that label
node = self.getNode(label)
#If the node exists
if(node is not None):
#If it has no children
if(node.getLeft() is None and node.getRight() is None):
self.__reassignNodes(node, None)
node = None
#Has only right children
elif(node.getLeft() is None and node.getRight() is not None):
self.__reassignNodes(node, node.getRight())
#Has only left children
elif(node.getLeft() is not None and node.getRight() is None):
self.__reassignNodes(node, node.getLeft())
#Has two children
else:
#Gets the max value of the left branch
tmpNode = self.getMax(node.getLeft())
#Deletes the tmpNode
self.delete(tmpNode.getLabel())
#Assigns the value to the node to delete and keesp tree structure
node.setLabel(tmpNode.getLabel())
def getNode(self, label):
curr_node = None
#If the tree is not empty
if(not self.empty()):
#Get tree root
curr_node = self.getRoot()
#While we don't find the node we look for
#I am using lazy evaluation here to avoid NoneType Attribute error
while curr_node is not None and curr_node.getLabel() is not label:
#If node label is less than current node
if label < curr_node.getLabel():
#We go left
curr_node = curr_node.getLeft()
else:
#Else we go right
curr_node = curr_node.getRight()
return curr_node
def getMax(self, root = None):
if(root is not None):
curr_node = root
else:
#We go deep on the right branch
curr_node = self.getRoot()
if(not self.empty()):
while(curr_node.getRight() is not None):
curr_node = curr_node.getRight()
return curr_node
def getMin(self, root = None):
if(root is not None):
curr_node = root
else:
#We go deep on the left branch
curr_node = self.getRoot()
if(not self.empty()):
curr_node = self.getRoot()
while(curr_node.getLeft() is not None):
curr_node = curr_node.getLeft()
return curr_node
def empty(self):
if self.root is None:
return True
return False
def __InOrderTraversal(self, curr_node):
nodeList = []
if curr_node is not None:
nodeList.insert(0, curr_node)
nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())
return nodeList
def getRoot(self):
return self.root
def __isRightChildren(self, node):
if(node == node.getParent().getRight()):
return True
return False
def __reassignNodes(self, node, newChildren):
if(newChildren is not None):
newChildren.setParent(node.getParent())
if(node.getParent() is not None):
#If it is the Right Children
if(self.__isRightChildren(node)):
node.getParent().setRight(newChildren)
else:
#Else it is the left children
node.getParent().setLeft(newChildren)
#This function traversal the tree. By default it returns an
#In order traversal list. You can pass a function to traversal
#The tree as needed by client code
def traversalTree(self, traversalFunction = None, root = None):
if(traversalFunction is None):
#Returns a list of nodes in preOrder by default
return self.__InOrderTraversal(self.root)
else:
#Returns a list of nodes in the order that the users wants to
return traversalFunction(self.root)
#Returns an string of all the nodes labels in the list
#In Order Traversal
def __str__(self):
list = self.__InOrderTraversal(self.root)
str = ""
for x in list:
str = str + " " + x.getLabel().__str__()
return str
def InPreOrder(curr_node):
nodeList = []
if curr_node is not None:
nodeList = nodeList + InPreOrder(curr_node.getLeft())
nodeList.insert(0, curr_node.getLabel())
nodeList = nodeList + InPreOrder(curr_node.getRight())
return nodeList
def testBinarySearchTree():
r'''
Example
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
'''
r'''
Example After Deletion
7
/ \
1 4
'''
t = BinarySearchTree()
t.insert(8)
t.insert(3)
t.insert(6)
t.insert(1)
t.insert(10)
t.insert(14)
t.insert(13)
t.insert(4)
t.insert(7)
#Prints all the elements of the list in order traversal
print(t.__str__())
if(t.getNode(6) is not None):
print("The label 6 exists")
else:
print("The label 6 doesn't exist")
if(t.getNode(-1) is not None):
print("The label -1 exists")
else:
print("The label -1 doesn't exist")
if(not t.empty()):
print(("Max Value: ", t.getMax().getLabel()))
print(("Min Value: ", t.getMin().getLabel()))
t.delete(13)
t.delete(10)
t.delete(8)
t.delete(3)
t.delete(6)
t.delete(14)
#Gets all the elements of the tree In pre order
#And it prints them
list = t.traversalTree(InPreOrder, t.root)
for x in list:
print(x)
if __name__ == "__main__":
testBinarySearchTree()
回答 14
我想展示@apadana方法的一种变体,当有大量节点时,它会更有用:
'''
Suppose we have the following tree
10
/ \
11 9
/ \ / \
7 12 15 8
'''# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist =[10,11,7,9,15,8,12]; n =[]for i in range(len(nlist)): n.append(Node(nlist[i]))# Step 2 - Set each node position
n[0].left = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]
I want to show a variation of @apadana’s method, which is more useful when there is a considerable number of nodes:
'''
Suppose we have the following tree
10
/ \
11 9
/ \ / \
7 12 15 8
'''
# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist = [10,11,7,9,15,8,12]; n = []
for i in range(len(nlist)): n.append(Node(nlist[i]))
# Step 2 - Set each node position
n[0].left = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]
回答 15
classNode:"""
single Node for tree
"""def __init__(self, data):
self.data = data
self.right =None
self.left =Noneclass binaryTree:"""
binary tree implementation
"""def __init__(self):
self.root =Nonedef push(self, element, node=None):if node isNone:
node = self.root
if self.root isNone:
self.root =Node(element)else:if element < node.data:if node.left isnotNone:
self.push(element, node.left)else:
node.left =Node(element)else:if node.right isnotNone:
self.push(element, node.right)else:
node.right =Node(element)def __str__(self):
self.printInorder(self.root)return"\n"def printInorder(self, node):"""
print tree in inorder
"""if node isnotNone:
self.printInorder(node.left)print(node.data)
self.printInorder(node.right)def main():"""
Main code and logic comes here
"""
tree = binaryTree()
tree.push(5)
tree.push(3)
tree.push(1)
tree.push(3)
tree.push(0)
tree.push(2)
tree.push(9)
tree.push(10)print(tree)if __name__ =="__main__":
main()
class Node:
"""
single Node for tree
"""
def __init__(self, data):
self.data = data
self.right = None
self.left = None
class binaryTree:
"""
binary tree implementation
"""
def __init__(self):
self.root = None
def push(self, element, node=None):
if node is None:
node = self.root
if self.root is None:
self.root = Node(element)
else:
if element < node.data:
if node.left is not None:
self.push(element, node.left)
else:
node.left = Node(element)
else:
if node.right is not None:
self.push(element, node.right)
else:
node.right = Node(element)
def __str__(self):
self.printInorder(self.root)
return "\n"
def printInorder(self, node):
"""
print tree in inorder
"""
if node is not None:
self.printInorder(node.left)
print(node.data)
self.printInorder(node.right)
def main():
"""
Main code and logic comes here
"""
tree = binaryTree()
tree.push(5)
tree.push(3)
tree.push(1)
tree.push(3)
tree.push(0)
tree.push(2)
tree.push(9)
tree.push(10)
print(tree)
if __name__ == "__main__":
main()
Here is a simple solution which can be used to build a binary tree using a recursive approach to display the tree in order traversal has been used in the below code.
class Node(object):
def __init__(self):
self.left = None
self.right = None
self.value = None
@property
def get_value(self):
return self.value
@property
def get_left(self):
return self.left
@property
def get_right(self):
return self.right
@get_left.setter
def set_left(self, left_node):
self.left = left_node
@get_value.setter
def set_value(self, value):
self.value = value
@get_right.setter
def set_right(self, right_node):
self.right = right_node
def create_tree(self):
_node = Node() #creating new node.
_x = input("Enter the node data(-1 for null)")
if(_x == str(-1)): #for defining no child.
return None
_node.set_value = _x #setting the value of the node.
print("Enter the left child of {}".format(_x))
_node.set_left = self.create_tree() #setting the left subtree
print("Enter the right child of {}".format(_x))
_node.set_right = self.create_tree() #setting the right subtree.
return _node
def pre_order(self, root):
if root is not None:
print(root.get_value)
self.pre_order(root.get_left)
self.pre_order(root.get_right)
if __name__ == '__main__':
node = Node()
root_node = node.create_tree()
node.pre_order(root_node)