标签归档:algorithm

给定一百万个数字的字符串,返回所有重复的3位数字

问题:给定一百万个数字的字符串,返回所有重复的3位数字

几个月前,我在纽约接受了一家对冲基金公司的采访,不幸的是,我没有获得数据/软件工程师的实习机会。(他们还要求解决方案使用Python。)

我几乎搞砸了第一次面试的问题…

问题:给定一百万个数字的字符串(例如,Pi),编写一个函数/程序,该函数/程序返回所有重复的3位数字,并且重复次数大于1

例如:如果字符串为:123412345123456则函数/程序将返回:

123 - 3 times
234 - 3 times
345 - 2 times

在面试失败后,他们没有给我解决方案,但他们确实告诉我,解决方案的时间复杂度恒定为1000,因为所有可能的结果都介于:

000-> 999

现在我正在考虑它,我认为不可能提出一个恒定时间算法。是吗?

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked the solution to be in Python.)

I pretty much screwed up on the first interview problem…

Question: Given a string of a million numbers (Pi for example), write a function/program that returns all repeating 3 digit numbers and number of repetition greater than 1

For example: if the string was: 123412345123456 then the function/program would return:

123 - 3 times
234 - 3 times
345 - 2 times

They did not give me the solution after I failed the interview, but they did tell me that the time complexity for the solution was constant of 1000 since all the possible outcomes are between:

000 –> 999

Now that I’m thinking about it, I don’t think it’s possible to come up with a constant time algorithm. Is it?


回答 0

您轻轻松松下手,您可能不想在对量子点不了解基本算法的对冲基金中工作:-)

在这种情况下,如果您需要至少访问一次每个元素,则无法处理任意大小的数据结构O(1)。在这种情况下,您可以期望的最好是字符串的长度。O(n)n

虽然,顺便说一句,标称O(n)算法O(1)对一个固定的输入大小,这样,在技术上,他们可能已经在这里正确的。但是,这通常不是人们使用复杂度分析的方式。

在我看来,您可能会在很多方面给他们留下深刻的印象。

首先,通知他们,它是不是能够做到这一点的O(1),除非你使用上面的“犯罪嫌疑人”说理给出。

其次,通过提供Pythonic代码来展示您的精英技能,例如:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

输出:

[(123, 3), (234, 3), (345, 2)]

尽管您当然可以将输出格式修改为所需的任何格式。

最后,通过告诉他们解决方案几乎没有问题O(n),因为上面的代码在不到半秒钟的时间内即可提供一百万个数字字符串的结果。它似乎也线性地缩放,因为一个10,000,000个字符的字符串需要3.5秒,而一个100,000,000个字符的字符串需要36秒。

而且,如果他们需要的更好,则可以采用多种方法并行化此类内容,从而可以大大加快这种处理速度。

当然,由于GIL的缘故,不在单个 Python解释器中,但是您可以将字符串拆分成类似的字符(vv为了正确处理边界区域,必须用表示的重叠):

    vv
123412  vv
    123451
        5123456

您可以将它们种出以分开工作,然后再合并结果。

输入的拆分和输出的合并很可能会用小字符串(甚至可能是百万位数字的字符串)淹没任何节省的时间,但是,对于更大的数据集,这很可能会有所作为。当然,我通常的口号是:“不要猜测”


此口头禅也适用于其他可能性,例如完全绕过Python并使用可能更快的其他语言。

例如,以下C代码,在相同的硬件作为较早Python代码运行,处理一个在0.6秒万位,大致为Python代码处理的相同的时间量之一百万。换句话说,速度快:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

You got off lightly, you probably don’t want to be working for a hedge fund where the quants don’t understand basic algorithms :-)

There is no way to process an arbitrarily-sized data structure in O(1) if, as in this case, you need to visit every element at least once. The best you can hope for is O(n) in this case, where n is the length of the string.

Although, as an aside, a nominal O(n) algorithm will be O(1) for a fixed input size so, technically, they may have been correct here. However, that’s not usually how people use complexity analysis.

It appears to me you could have impressed them in a number of ways.

First, by informing them that it’s not possible to do it in O(1), unless you use the “suspect” reasoning given above.

Second, by showing your elite skills by providing Pythonic code such as:

inpStr = '123412345123456'

# O(1) array creation.
freq = [0] * 1000

# O(n) string processing.
for val in [int(inpStr[pos:pos+3]) for pos in range(len(inpStr) - 2)]:
    freq[val] += 1

# O(1) output of relevant array values.
print ([(num, freq[num]) for num in range(1000) if freq[num] > 1])

This outputs:

[(123, 3), (234, 3), (345, 2)]

though you could, of course, modify the output format to anything you desire.

And, finally, by telling them there’s almost certainly no problem with an O(n) solution, since the code above delivers results for a one-million-digit string in well under half a second. It seems to scale quite linearly as well, since a 10,000,000-character string takes 3.5 seconds and a 100,000,000-character one takes 36 seconds.

And, if they need better than that, there are ways to parallelise this sort of stuff that can greatly speed it up.

Not within a single Python interpreter of course, due to the GIL, but you could split the string into something like (overlap indicated by vv is required to allow proper processing of the boundary areas):

    vv
123412  vv
    123451
        5123456

You can farm these out to separate workers and combine the results afterwards.

The splitting of input and combining of output are likely to swamp any saving with small strings (and possibly even million-digit strings) but, for much larger data sets, it may well make a difference. My usual mantra of “measure, don’t guess” applies here, of course.


This mantra also applies to other possibilities, such as bypassing Python altogether and using a different language which may be faster.

For example, the following C code, running on the same hardware as the earlier Python code, handles a hundred million digits in 0.6 seconds, roughly the same amount of time as the Python code processed one million. In other words, much faster:

#include <stdio.h>
#include <string.h>

int main(void) {
    static char inpStr[100000000+1];
    static int freq[1000];

    // Set up test data.

    memset(inpStr, '1', sizeof(inpStr));
    inpStr[sizeof(inpStr)-1] = '\0';

    // Need at least three digits to do anything useful.

    if (strlen(inpStr) <= 2) return 0;

    // Get initial feed from first two digits, process others.

    int val = (inpStr[0] - '0') * 10 + inpStr[1] - '0';
    char *inpPtr = &(inpStr[2]);
    while (*inpPtr != '\0') {
        // Remove hundreds, add next digit as units, adjust table.

        val = (val % 100) * 10 + *inpPtr++ - '0';
        freq[val]++;
    }

    // Output (relevant part of) table.

    for (int i = 0; i < 1000; ++i)
        if (freq[i] > 1)
            printf("%3d -> %d\n", i, freq[i]);

    return 0;
}

回答 1

固定时间是不可能的。所有一百万个数字都需要至少被查看一次,因此这是时间复杂度O(n),在这种情况下,n =一百万。

对于简单的O(n)解决方案,创建一个大小为1000的数组,该数组表示每个可能的3位数字的出现次数。一次前进1位数字,第一个索引== 0,最后一个索引== 999997,并递增array [3位数字]以创建直方图(每个可能的3位数字出现的次数)。然后输出计数> 1的数组内容。

Constant time isn’t possible. All 1 million digits need to be looked at at least once, so that is a time complexity of O(n), where n = 1 million in this case.

For a simple O(n) solution, create an array of size 1000 that represents the number of occurrences of each possible 3 digit number. Advance 1 digit at a time, first index == 0, last index == 999997, and increment array[3 digit number] to create a histogram (count of occurrences for each possible 3 digit number). Then output the content of the array with counts > 1.


回答 2

一百万对于我在下面给出的答案来说很小。只期望您必须能够不间断地在面试中运行解决方案,然后以下操作将在不到两秒钟的时间内完成并给出所需的结果:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

希望访问者可以使用标准库collections.Counter类。

并行执行版本

我为此写了一篇博客文章,并提供了更多解释。

A million is small for the answer I give below. Expecting only that you have to be able to run the solution in the interview, without a pause, then The following works in less than two seconds and gives the required result:

from collections import Counter

def triple_counter(s):
    c = Counter(s[n-3: n] for n in range(3, len(s)))
    for tri, n in c.most_common():
        if n > 1:
            print('%s - %i times.' % (tri, n))
        else:
            break

if __name__ == '__main__':
    import random

    s = ''.join(random.choice('0123456789') for _ in range(1_000_000))
    triple_counter(s)

Hopefully the interviewer would be looking for use of the standard libraries collections.Counter class.

Parallel execution version

I wrote a blog post on this with more explanation.


回答 3

简单的O(n)解决方案是对每个3位数字进行计数:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

这将搜索全部100万个数字1000次。

仅遍历数字一次:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

时序显示,仅对索引进行一次迭代是使用的两倍count

The simple O(n) solution would be to count each 3-digit number:

for nr in range(1000):
    cnt = text.count('%03d' % nr)
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

This would search through all 1 million digits 1000 times.

Traversing the digits only once:

counts = [0] * 1000
for idx in range(len(text)-2):
    counts[int(text[idx:idx+3])] += 1

for nr, cnt in enumerate(counts):
    if cnt > 1:
        print '%03d is found %d times' % (nr, cnt)

Timing shows that iterating only once over the index is twice as fast as using count.


回答 4

这是“共识” O(n)算法的NumPy实现:遍历所有三元组和bin。通过遇到“ 385”,将bin加到bin [3,8,5](这是一个O(1)操作)中来完成合并。垃圾箱排列成一个10x10x10立方体。由于合并已完全矢量化,因此代码中没有循环。

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

毫不奇怪,在大型数据集上,NumPy比@Daniel的纯Python解决方案要快一点。样本输出:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

Here is a NumPy implementation of the “consensus” O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say “385”, adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.

def setup_data(n):
    import random
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))

def f_np(text):
    # Get the data into NumPy
    import numpy as np
    a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
    # Rolling triplets
    a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)

    bins = np.zeros((10, 10, 10), dtype=int)
    # Next line performs O(n) binning
    np.add.at(bins, tuple(a3), 1)
    # Filtering is left as an exercise
    return bins.ravel()

def f_py(text):
    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    return counts

import numpy as np
import types
from timeit import timeit
for n in (10, 1000, 1000000):
    data = setup_data(n)
    ref = f_np(**data)
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            assert np.all(ref == func(**data))
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
        except:
            print("{:16s} apparently crashed".format(name[2:]))

Unsurprisingly, NumPy is a bit faster than @Daniel’s pure Python solution on large data sets. Sample output:

# n = 10
# np                    0.03481400 ms
# py                    0.00669330 ms
# n = 1000
# np                    0.11215360 ms
# py                    0.34836530 ms
# n = 1000000
# np                   82.46765980 ms
# py                  360.51235450 ms

回答 5

我将解决以下问题:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

应用于示例字符串,将生成:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

该解决方案在O(n)中运行,因为n是提供的字符串的长度,并且我认为这是您可以获得的最佳结果。

I would solve the problem as follows:

def find_numbers(str_num):
    final_dict = {}
    buffer = {}
    for idx in range(len(str_num) - 3):
        num = int(str_num[idx:idx + 3])
        if num not in buffer:
            buffer[num] = 0
        buffer[num] += 1
        if buffer[num] > 1:
            final_dict[num] = buffer[num]
    return final_dict

Applied to your example string, this yields:

>>> find_numbers("123412345123456")
{345: 2, 234: 3, 123: 3}

This solution runs in O(n) for n being the length of the provided string, and is, I guess, the best you can get.


回答 6

根据我的理解,您无法在固定时间内获得解决方案。至少需要通过一百万个数字(假设它是一个字符串)。您可以对百万个长度数字的位数进行三位数的滚动迭代,如果哈希键已经存在,则将其增加1;如果哈希密钥不存在,则创建一个新的哈希键(由值1初始化)。词典。

该代码将如下所示:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

您可以筛选出项值大于1的键。

As per my understanding, you cannot have the solution in a constant time. It will take at least one pass over the million digit number (assuming its a string). You can have a 3-digit rolling iteration over the digits of the million length number and increase the value of hash key by 1 if it already exists or create a new hash key (initialized by value 1) if it doesn’t exists already in the dictionary.

The code will look something like this:

def calc_repeating_digits(number):

    hash = {}

    for i in range(len(str(number))-2):

        current_three_digits = number[i:i+3]
        if current_three_digits in hash.keys():
            hash[current_three_digits] += 1

        else:
            hash[current_three_digits] = 1

    return hash

You can filter down to the keys which have item value greater than 1.


回答 7

如另一个答案中所述,您不能在固定时间内执行此算法,因为您必须查看至少n位数字。线性时间是最快的。

但是,该算法可以在O(1)空间中完成。您只需要存储每个3位数字的计数,因此您需要一个包含1000个条目的数组。然后,您可以输入号码。

我的猜测是,当面试官给您解决方案时,他们会误以为是,或者当他们说“恒定空间”时,您会误以为“恒定时间”。

As mentioned in another answer, you cannot do this algorithm in constant time, because you must look at at least n digits. Linear time is the fastest you can get.

However, the algorithm can be done in O(1) space. You only need to store the counts of each 3 digit number, so you need an array of 1000 entries. You can then stream the number in.

My guess is that either the interviewer misspoke when they gave you the solution, or you misheard “constant time” when they said “constant space.”


回答 8

这是我的答案:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

数组查找方法非常快(甚至比@ paul-panzer的numpy方法还快!)。当然,它作弊是因为它在完成后并未在技术上完成,因为它正在返回生成器。它也不必检查每次迭代是否已经存在该值,这可能会有所帮助。

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

Here’s my answer:

from timeit import timeit
from collections import Counter
import types
import random

def setup_data(n):
    digits = "0123456789"
    return dict(text = ''.join(random.choice(digits) for i in range(n)))


def f_counter(text):
    c = Counter()
    for i in range(len(text)-2):
        ss = text[i:i+3]
        c.update([ss])
    return (i for i in c.items() if i[1] > 1)

def f_dict(text):
    d = {}
    for i in range(len(text)-2):
        ss = text[i:i+3]
        if ss not in d:
            d[ss] = 0
        d[ss] += 1
    return ((i, d[i]) for i in d if d[i] > 1)

def f_array(text):
    a = [[[0 for _ in range(10)] for _ in range(10)] for _ in range(10)]
    for n in range(len(text)-2):
        i, j, k = (int(ss) for ss in text[n:n+3])
        a[i][j][k] += 1
    for i, b in enumerate(a):
        for j, c in enumerate(b):
            for k, d in enumerate(c):
                if d > 1: yield (f'{i}{j}{k}', d)


for n in (1E1, 1E3, 1E6):
    n = int(n)
    data = setup_data(n)
    print(f'n = {n}')
    results = {}
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
            'results[name] = f(**data)', globals={'f':func, 'data':data, 'results':results, 'name':name}, number=10)*100))
    for r in results:
        print('{:10}: {}'.format(r, sorted(list(results[r]))[:5]))

The array lookup method is very fast (even faster than @paul-panzer’s numpy method!). Of course, it cheats since it isn’t technicailly finished after it completes, because it’s returning a generator. It also doesn’t have to check every iteration if the value already exists, which is likely to help a lot.

n = 10
counter               0.10595780 ms
dict                  0.01070654 ms
array                 0.00135370 ms
f_counter : []
f_dict    : []
f_array   : []
n = 1000
counter               2.89462101 ms
dict                  0.40434612 ms
array                 0.00073838 ms
f_counter : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_dict    : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
f_array   : [('008', 2), ('009', 3), ('010', 2), ('016', 2), ('017', 2)]
n = 1000000
counter            2849.00500992 ms
dict                438.44007806 ms
array                 0.00135370 ms
f_counter : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_dict    : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]
f_array   : [('000', 1058), ('001', 943), ('002', 1030), ('003', 982), ('004', 1042)]

回答 9

图片作为答案:

看起来像一个滑动窗口。

Image as answer:

Looks like a sliding window.


回答 10

这是我的解决方案:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

在for循环中具有一些创造力(例如,带有True / False / None的附加查找列表),您应该可以摆脱最后一行,因为您只想创建一个字典,直到我们访问该点为止。希望能帮助到你 :)

Here is my solution:

from collections import defaultdict
string = "103264685134845354863"
d = defaultdict(int)
for elt in range(len(string)-2):
    d[string[elt:elt+3]] += 1
d = {key: d[key] for key in d.keys() if d[key] > 1}

With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point. Hope it helps :)


回答 11

-从C角度讲。-您可以得到一个int 3-d数组结果[10] [10] [10]; -从第0个位置转到第n-4个位置,其中n是字符串数组的大小。-在每个位置上,检查当前,下一个和下一个下一个。-将cntr增加为resutls [current] [next] [next的下一个] ++;-打印的值

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-现在是O(n)时间,不涉及比较。-您可以在此处运行一些并行的东西,方法是对数组进行分区并计算分区之间的匹配项。

-Telling from the perspective of C. -You can have an int 3-d array results[10][10][10]; -Go from 0th location to n-4th location, where n being the size of the string array. -On each location, check the current, next and next’s next. -Increment the cntr as resutls[current][next][next’s next]++; -Print the values of

results[1][2][3]
results[2][3][4]
results[3][4][5]
results[4][5][6]
results[5][6][7]
results[6][7][8]
results[7][8][9]

-It is O(n) time, there is no comparisons involved. -You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.


回答 12

inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count
inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'

count = {}
for i in range(len(inputStr) - 2):
    subNum = int(inputStr[i:i+3])
    if subNum not in count:
        count[subNum] = 1
    else:
        count[subNum] += 1

print count

泡泡排序作业

问题:泡泡排序作业

在课堂上,我们正在做排序算法,尽管我在谈论它们并编写伪代码时理解得很好,但是我在为它们编写实际代码时遇到了问题。

这是我在Python中的尝试:

mylist = [12, 5, 13, 8, 9, 65]

def bubble(badList):
    length = len(badList) - 1
    unsorted = True

    while unsorted:
        for element in range(0,length):
            unsorted = False
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                print badList
            else:
                unsorted = True

print bubble(mylist)

现在,(据我所知)该排序正确,但是一旦完成,它就会无限循环。

如何修复此代码,以便函数正确完成并正确排序任何(合理)大小的列表?

PS:我知道我不应该在函数中真正打印输出,而应该得到返回值,但是我还没有这样做,因为我的代码还没有真正起作用。

In class we are doing sorting algorithms and, although I understand them fine when talking about them and writing pseudocode, I am having problems writing actual code for them.

This is my attempt in Python:

mylist = [12, 5, 13, 8, 9, 65]

def bubble(badList):
    length = len(badList) - 1
    unsorted = True

    while unsorted:
        for element in range(0,length):
            unsorted = False
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                print badList
            else:
                unsorted = True

print bubble(mylist)

Now, this (as far as I can tell) sorts correctly, but once it finishes it just loops indefinitely.

How can this code be fixed so the function finishes properly and correctly sorts a list of any (reasonable) size?

P.S. I know I should not really have prints in a function and I should have a return, but I just have not done that yet as my code does not really work yet.


回答 0

为了解释为什么您的脚本现在无法正常工作,我将变量重命名unsortedsorted

首先,您的列表尚未排序。当然,我们设置sortedFalse

一旦开始while循环,我们就假定该列表已经排序。想法是这样的:一旦找到两个顺序不正确的元素,便将其设置sortedFalse只有在没有顺序错误的元素时sorted才会保留。True

sorted = False  # We haven't started sorting yet

while not sorted:
    sorted = True  # Assume the list is now sorted
    for element in range(0, length):
        if badList[element] > badList[element + 1]:
            sorted = False  # We found two elements in the wrong order
            hold = badList[element + 1]
            badList[element + 1] = badList[element]
            badList[element] = hold
    # We went through the whole list. At this point, if there were no elements
    # in the wrong order, sorted is still True. Otherwise, it's false, and the
    # while loop executes again.

还有一些小问题,可以帮助提高代码的效率或可读性。

  • for循环中,使用变量element。从技术上讲,element不是元素;它是代表列表索引的数字。而且,它很长。在这些情况下,只需使用一个临时变量名即可,例如i“ index”。

    for i in range(0, length):
  • range命令还可以仅接受一个参数(名为stop)。在这种情况下,您将获得从0到该参数的所有整数的列表。

    for i in range(length):
  • Python样式指南》建议使用下划线将变量命名为小写。对于这样的小脚本,这是一个很小的缺点。它会让您习惯于Python代码最常类似于的东西。

    def bubble(bad_list):
  • 要交换两个变量的值,请将它们写为元组分配。右侧被评估为元组(例如(badList[i+1], badList[i])is (3, 5)),然后被分配给左侧的两个变量((badList[i], badList[i+1]))。

    bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i]

放在一起,你会得到:

my_list = [12, 5, 13, 8, 9, 65]

def bubble(bad_list):
    length = len(bad_list) - 1
    sorted = False

    while not sorted:
        sorted = True
        for i in range(length):
            if bad_list[i] > bad_list[i+1]:
                sorted = False
                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i]

bubble(my_list)
print my_list

(顺便说一句,我也删除了您的打印声明。)

To explain why your script isn’t working right now, I’ll rename the variable unsorted to sorted.

At first, your list isn’t yet sorted. Of course, we set sorted to False.

As soon as we start the while loop, we assume that the list is already sorted. The idea is this: as soon as we find two elements that are not in the right order, we set sorted back to False. sorted will remain True only if there were no elements in the wrong order.

sorted = False  # We haven't started sorting yet

while not sorted:
    sorted = True  # Assume the list is now sorted
    for element in range(0, length):
        if badList[element] > badList[element + 1]:
            sorted = False  # We found two elements in the wrong order
            hold = badList[element + 1]
            badList[element + 1] = badList[element]
            badList[element] = hold
    # We went through the whole list. At this point, if there were no elements
    # in the wrong order, sorted is still True. Otherwise, it's false, and the
    # while loop executes again.

There are also minor little issues that would help the code be more efficient or readable.

  • In the for loop, you use the variable element. Technically, element is not an element; it’s a number representing a list index. Also, it’s quite long. In these cases, just use a temporary variable name, like i for “index”.

    for i in range(0, length):
    
  • The range command can also take just one argument (named stop). In that case, you get a list of all the integers from 0 to that argument.

    for i in range(length):
    
  • The Python Style Guide recommends that variables be named in lowercase with underscores. This is a very minor nitpick for a little script like this; it’s more to get you accustomed to what Python code most often resembles.

    def bubble(bad_list):
    
  • To swap the values of two variables, write them as a tuple assignment. The right hand side gets evaluated as a tuple (say, (badList[i+1], badList[i]) is (3, 5)) and then gets assigned to the two variables on the left hand side ((badList[i], badList[i+1])).

    bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i]
    

Put it all together, and you get this:

my_list = [12, 5, 13, 8, 9, 65]

def bubble(bad_list):
    length = len(bad_list) - 1
    sorted = False

    while not sorted:
        sorted = True
        for i in range(length):
            if bad_list[i] > bad_list[i+1]:
                sorted = False
                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i]

bubble(my_list)
print my_list

(I removed your print statement too, by the way.)


回答 1

气泡分类的目的是在每轮底部移动较重的项目,同时向上移动较轻的项目。在内部循环中,您可以比较元素,而不必每次都迭代整个列表。将最重的已经排在最后。该交换变量是一个额外的检查,所以我们可以标记列表现在可以正确排序,并避免不必要的计算仍在继续。

def bubble(badList):
    length = len(badList)
    for i in range(0,length):
        swapped = False
        for element in range(0, length-i-1):
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                swapped = True
        if not swapped: break

    return badList

您的版本1已更正:

def bubble(badList):
    length = len(badList) - 1
    unsorted = True
    while unsorted:
        unsorted = False
        for element in range(0,length):
            #unsorted = False
            if badList[element] > badList[element + 1]:
                 hold = badList[element + 1]
                 badList[element + 1] = badList[element]
                 badList[element] = hold
                 unsorted = True
                 #print badList
             #else:
                 #unsorted = True

     return badList

The goal of bubble sort is to move the heavier items at the bottom in each round, while moving the lighter items up. In the inner loop, where you compare the elements, you don’t have to iterate the whole list in each turn. The heaviest is already placed last. The swapped variable is an extra check so we can mark that the list is now sorted and avoid continuing with unnecessary calculations.

def bubble(badList):
    length = len(badList)
    for i in range(0,length):
        swapped = False
        for element in range(0, length-i-1):
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                swapped = True
        if not swapped: break

    return badList

Your version 1, corrected:

def bubble(badList):
    length = len(badList) - 1
    unsorted = True
    while unsorted:
        unsorted = False
        for element in range(0,length):
            #unsorted = False
            if badList[element] > badList[element + 1]:
                 hold = badList[element + 1]
                 badList[element + 1] = badList[element]
                 badList[element] = hold
                 unsorted = True
                 #print badList
             #else:
                 #unsorted = True

     return badList

回答 2

当您使用负含义的变量名时,您需要反转它们的值,这就是发生的情况。以下内容将更容易理解:

sorted = False
while not sorted:
    ...

另一方面,算法的逻辑有点偏离。您需要检查在for循环期间是否交换了两个元素。这是我的写法:

def bubble(values):
    length = len(values) - 1
    sorted = False
    while not sorted:
        sorted = True
        for element in range(0,length):
            if values[element] > values[element + 1]:
                 hold = values[element + 1]
                 values[element + 1] = values[element]
                 values[element] = hold
                 sorted = False
    return values

This is what happens when you use variable name of negative meaning, you need to invert their values. The following would be easier to understand:

sorted = False
while not sorted:
    ...

On the other hand, the logic of the algorithm is a little bit off. You need to check whether two elements swapped during the for loop. Here’s how I would write it:

def bubble(values):
    length = len(values) - 1
    sorted = False
    while not sorted:
        sorted = True
        for element in range(0,length):
            if values[element] > values[element + 1]:
                 hold = values[element + 1]
                 values[element + 1] = values[element]
                 values[element] = hold
                 sorted = False
    return values

回答 3

您对Unsorted变量的使用是错误的;您想要一个变量来告诉您是否交换了两个元素;如果这样做,则可以退出循环,否则,需要再次循环。要解决您在此处遇到的问题,只需在if情况的正文中输入“ unsorted = false”;删除其他情况;并在for循环前添加“ unsorted = true” 。

Your use of the Unsorted variable is wrong; you want to have a variable that tells you if you have swapped two elements; if you have done that, you can exit your loop, otherwise, you need to loop again. To fix what you’ve got here, just put the “unsorted = false” in the body of your if case; remove your else case; and put “unsorted = true before your for loop.


回答 4

def bubble_sort(l):
    for passes_left in range(len(l)-1, 0, -1):
        for index in range(passes_left):
            if l[index] < l[index + 1]:
               l[index], l[index + 1] = l[index + 1], l[index]
    return l
def bubble_sort(l):
    for passes_left in range(len(l)-1, 0, -1):
        for index in range(passes_left):
            if l[index] < l[index + 1]:
               l[index], l[index + 1] = l[index + 1], l[index]
    return l

回答 5

#一个非常简单的函数,可以通过减少第二个数组的问题空间来优化(显然)。但是相同的O(n ^ 2)复杂度。

def bubble(arr):
    l = len(arr)        
    for a in range(l):
        for b in range(l-1):
            if (arr[a] < arr[b]):
            arr[a], arr[b] = arr[b], arr[a]
    return arr 

#A very simple function, can be optimized (obviously) by decreasing the problem space of the 2nd array. But same O(n^2) complexity.

def bubble(arr):
    l = len(arr)        
    for a in range(l):
        for b in range(l-1):
            if (arr[a] < arr[b]):
            arr[a], arr[b] = arr[b], arr[a]
    return arr 

回答 6

您那里有几个错误。第一个是长度,第二个是您未排序的使用(如McWafflestix所述)。如果要打印列表,您可能还想返回该列表:

mylist = [12, 5, 13, 8, 9, 65]

def bubble(badList):
    length = len(badList) - 2
    unsorted = True

    while unsorted:
        for element in range(0,length):
            unsorted = False

            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                print badList
                unsorted = True

    return badList

print bubble(mylist)

埃塔:你是对的,上面的小车就像地狱一样。我的缺点是不通过更多示例进行测试。

def bubble2(badList):
    swapped = True
    length = len(badList) - 2

    while swapped:
        swapped = False
        for i in range(0, length):
            if badList[i] > badList[i + 1]:

                # swap
                hold = badList[i + 1]
                badList[i + 1] = badList[i]
                badList[i] = hold

                swapped = True

    return badList

You’ve got a couple of errors in there. The first is in length, and the second is in your use of unsorted (as stated by McWafflestix). You probably also want to return the list if you’re going to print it:

mylist = [12, 5, 13, 8, 9, 65]

def bubble(badList):
    length = len(badList) - 2
    unsorted = True

    while unsorted:
        for element in range(0,length):
            unsorted = False

            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                print badList
                unsorted = True

    return badList

print bubble(mylist)

eta: You’re right, the above is buggy as hell. My bad for not testing through some more examples.

def bubble2(badList):
    swapped = True
    length = len(badList) - 2

    while swapped:
        swapped = False
        for i in range(0, length):
            if badList[i] > badList[i + 1]:

                # swap
                hold = badList[i + 1]
                badList[i + 1] = badList[i]
                badList[i] = hold

                swapped = True

    return badList

回答 7

我是一个新鲜的初学者,昨天开始阅读有关Python的文章。受到您的榜样的启发,我创作了一些也许具有80年代风格的作品,但仍然可以

lista1 = [12, 5, 13, 8, 9, 65]

i=0
while i < len(lista1)-1:
    if lista1[i] > lista1[i+1]:
        x = lista1[i]
        lista1[i] = lista1[i+1]
        lista1[i+1] = x
        i=0
        continue
    else:
        i+=1

print(lista1)

I am a fresh fresh beginner, started to read about Python yesterday. Inspired by your example I created something maybe more in the 80-ties style, but nevertheless it kinda works

lista1 = [12, 5, 13, 8, 9, 65]

i=0
while i < len(lista1)-1:
    if lista1[i] > lista1[i+1]:
        x = lista1[i]
        lista1[i] = lista1[i+1]
        lista1[i+1] = x
        i=0
        continue
    else:
        i+=1

print(lista1)

回答 8

原始算法的问题在于,如果列表中的数字更小,它将无法将其带到正确的排序位置。程序每次都需要从头开始,以确保数字始终排序。

我简化了代码,它现在适用于任何数字列表,而不管列表如何,即使有重复的数字也是如此。这是代码

mylist = [9, 8, 5, 4, 12, 1, 7, 5, 2]
print mylist

def bubble(badList):
    length = len(badList) - 1
    element = 0
    while element < length:
        if badList[element] > badList[element + 1]:
            hold = badList[element + 1]
            badList[element + 1] = badList[element]
            badList[element] = hold
            element = 0
            print badList
        else:
            element = element + 1

print bubble(mylist)

The problem with the original algorithm is that if you had a lower number further in the list, it would not bring it to the correct sorted position. The program needs to go back the the beginning each time to ensure that the numbers sort all the way through.

I simplified the code and it will now work for any list of numbers regardless of the list and even if there are repeating numbers. Here’s the code

mylist = [9, 8, 5, 4, 12, 1, 7, 5, 2]
print mylist

def bubble(badList):
    length = len(badList) - 1
    element = 0
    while element < length:
        if badList[element] > badList[element + 1]:
            hold = badList[element + 1]
            badList[element + 1] = badList[element]
            badList[element] = hold
            element = 0
            print badList
        else:
            element = element + 1

print bubble(mylist)

回答 9

def bubble_sort(l):
    exchanged = True
    iteration = 0
    n = len(l)

    while(exchanged):
        iteration += 1
        exchanged = False

        # Move the largest element to the end of the list
        for i in range(n-1):
            if l[i] > l[i+1]:
                exchanged = True
                l[i], l[i+1] = l[i+1], l[i]
        n -= 1   # Largest element already towards the end

    print 'Iterations: %s' %(iteration)
    return l
def bubble_sort(l):
    exchanged = True
    iteration = 0
    n = len(l)

    while(exchanged):
        iteration += 1
        exchanged = False

        # Move the largest element to the end of the list
        for i in range(n-1):
            if l[i] > l[i+1]:
                exchanged = True
                l[i], l[i+1] = l[i+1], l[i]
        n -= 1   # Largest element already towards the end

    print 'Iterations: %s' %(iteration)
    return l

回答 10

def bubbleSort(alist):
if len(alist) <= 1:
    return alist
for i in range(0,len(alist)):
   print "i is :%d",i
   for j in range(0,i):
      print "j is:%d",j
      print "alist[i] is :%d, alist[j] is :%d"%(alist[i],alist[j])
      if alist[i] > alist[j]:
         alist[i],alist[j] = alist[j],alist[i]
return alist

alist = [54,26,93,17,77,31,44,55,20,-23,-34,16,11,11,11]

打印bubbleSort(alist)

def bubbleSort(alist):
if len(alist) <= 1:
    return alist
for i in range(0,len(alist)):
   print "i is :%d",i
   for j in range(0,i):
      print "j is:%d",j
      print "alist[i] is :%d, alist[j] is :%d"%(alist[i],alist[j])
      if alist[i] > alist[j]:
         alist[i],alist[j] = alist[j],alist[i]
return alist

alist = [54,26,93,17,77,31,44,55,20,-23,-34,16,11,11,11]

print bubbleSort(alist)


回答 11

def bubble_sort(a):
    t = 0
    sorted = False # sorted = False because we have not began to sort
    while not sorted:
    sorted = True # Assume sorted = True first, it will switch only there is any change
        for key in range(1,len(a)):
            if a[key-1] > a[key]:
                sorted = False
                t = a[key-1]; a[key-1] = a[key]; a[key] = t;
    print a
def bubble_sort(a):
    t = 0
    sorted = False # sorted = False because we have not began to sort
    while not sorted:
    sorted = True # Assume sorted = True first, it will switch only there is any change
        for key in range(1,len(a)):
            if a[key-1] > a[key]:
                sorted = False
                t = a[key-1]; a[key-1] = a[key]; a[key] = t;
    print a

回答 12

一个简单的例子:

a = len(alist)-1
while a > 0:
    for b in range(0,a):
        #compare with the adjacent element
        if alist[b]>=alist[b+1]:
            #swap both elements
            alist[b], alist[b+1] = alist[b+1], alist[b]
    a-=1

这只是简单地将元素从0到a(基本上是该回合中所有未排序的元素),并将其与相邻元素进行比较,如果大于相邻元素,则进行交换。在回合结束时,对最后一个元素进行排序,然后在没有该元素的情况下再次运行该过程,直到对所有元素进行了排序。

不需要条件是否sort成立。

请注意,此算法仅在交换时才考虑数字的位置,因此重复的数字不会对其产生影响。

PS。我知道这个问题发布已经很长时间了,但是我只想分享这个想法。

A simpler example:

a = len(alist)-1
while a > 0:
    for b in range(0,a):
        #compare with the adjacent element
        if alist[b]>=alist[b+1]:
            #swap both elements
            alist[b], alist[b+1] = alist[b+1], alist[b]
    a-=1

This simply takes the elements from 0 to a(basically, all the unsorted elements in that round) and compares it with its adjacent element, and making a swap if it is greater than its adjacent element. At the end the round, the last element is sorted, and the process runs again without it, until all elements have been sorted.

There is no need for a condition whether sort is true or not.

Note that this algorithm takes into consideration the position of the numbers only when swapping, so repeated numbers will not affect it.

PS. I know it has been very long since this question was posted, but I just wanted to share this idea.


回答 13

arr = [5,4,3,1,6,8,10,9] # array not sorted

for i in range(len(arr)):
    for j in range(i, len(arr)):
        if(arr[i] > arr[j]):
            arr[i], arr[j] = arr[j], arr[i]

            print (arr)
arr = [5,4,3,1,6,8,10,9] # array not sorted

for i in range(len(arr)):
    for j in range(i, len(arr)):
        if(arr[i] > arr[j]):
            arr[i], arr[j] = arr[j], arr[i]

            print (arr)

回答 14

def bubble_sort(li):
    l = len(li)
    tmp = None
    sorted_l = sorted(li)
    while (li != sorted_l):
        for ele in range(0,l-1):
            if li[ele] > li[ele+1]:
                tmp = li[ele+1]
                li[ele+1] = li [ele]
                li[ele] = tmp
    return li
def bubble_sort(li):
    l = len(li)
    tmp = None
    sorted_l = sorted(li)
    while (li != sorted_l):
        for ele in range(0,l-1):
            if li[ele] > li[ele+1]:
                tmp = li[ele+1]
                li[ele+1] = li [ele]
                li[ele] = tmp
    return li

回答 15

def bubbleSort ( arr ):
    swapped = True 
    length = len ( arr )
    j = 0

    while swapped:
        swapped = False
        j += 1 
        for i in range ( length  - j ):
            if arr [ i ] > arr [ i + 1 ]:
                # swap
                tmp = arr [ i ]
                arr [ i ] = arr [ i + 1]
                arr [ i + 1 ] = tmp 

                swapped = True

if __name__ == '__main__':
    # test list
    a = [ 67, 45, 39, -1, -5, -44 ];

    print ( a )
    bubbleSort ( a )
    print ( a )
def bubbleSort ( arr ):
    swapped = True 
    length = len ( arr )
    j = 0

    while swapped:
        swapped = False
        j += 1 
        for i in range ( length  - j ):
            if arr [ i ] > arr [ i + 1 ]:
                # swap
                tmp = arr [ i ]
                arr [ i ] = arr [ i + 1]
                arr [ i + 1 ] = tmp 

                swapped = True

if __name__ == '__main__':
    # test list
    a = [ 67, 45, 39, -1, -5, -44 ];

    print ( a )
    bubbleSort ( a )
    print ( a )

回答 16

def bubblesort(array):
    for i in range(len(array)-1):
        for j in range(len(array)-1-i):
            if array[j] > array[j+1]:
                array[j], array[j+1] = array[j+1], array[j]
    return(array)

print(bubblesort([3,1,6,2,5,4]))
def bubblesort(array):
    for i in range(len(array)-1):
        for j in range(len(array)-1-i):
            if array[j] > array[j+1]:
                array[j], array[j+1] = array[j+1], array[j]
    return(array)

print(bubblesort([3,1,6,2,5,4]))

回答 17

我考虑添加我的解决方案,因为这里曾经有解决方案

  1. 更长的时间
  2. 更大的空间复杂度
  3. 或进行过多的操作

那应该是

所以,这是我的解决方案:


def countInversions(arr):
    count = 0
    n = len(arr)
    for i in range(n):
        _count = count
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                count += 1
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
        if _count == count:
            break
    return count

I consider adding my solution because ever solution here is having

  1. greater time
  2. greater space complexity
  3. or doing too much operations

then is should be

So, here is my solution:


def countInversions(arr):
    count = 0
    n = len(arr)
    for i in range(n):
        _count = count
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                count += 1
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
        if _count == count:
            break
    return count

回答 18

如果有人对使用列表理解的较短实现感兴趣:

def bubble_sort(lst: list) -> None:
    [swap_items(lst, i, i+1) for left in range(len(lst)-1, 0, -1) for i in range(left) if lst[i] > lst[i+1]]


def swap_items(lst: list, pos1: int, pos2: int) -> None:
    lst[pos1], lst[pos2] = lst[pos2], lst[pos1]

If anyone is interested in a shorter implementation using a list comprehension:

def bubble_sort(lst: list) -> None:
    [swap_items(lst, i, i+1) for left in range(len(lst)-1, 0, -1) for i in range(left) if lst[i] > lst[i+1]]


def swap_items(lst: list, pos1: int, pos2: int) -> None:
    lst[pos1], lst[pos2] = lst[pos2], lst[pos1]

回答 19

这是不带for循环的气泡排序的另一种形式。基本上,您正在考虑的lastIndexarray然后慢慢进行decrementing,直到它成为数组的第一个索引为止。

algorithm将继续通过这样的阵列移动,直到整个通而没有任何由swaps发生。

泡沫基本上Quadratic Time: O(n²)是关于性能的。

class BubbleSort: 
  def __init__(self, arr):
    self.arr = arr;

  def bubbleSort(self):
    count = 0;
    lastIndex = len(self.arr) - 1;
    
    while(count < lastIndex):
      if(self.arr[count] > self.arr[count + 1]):
        self.swap(count)  
      count = count + 1;

      if(count == lastIndex):
        count = 0;
        lastIndex = lastIndex - 1;   

  def swap(self, count):
    temp = self.arr[count];
    self.arr[count] = self.arr[count + 1];
    self.arr[count + 1] = temp;
    
arr = [9, 1, 5, 3, 8, 2]
p1 = BubbleSort(arr)

print(p1.bubbleSort())

Here is a different variation of bubble sort without for loop. Basically you are considering the lastIndex of the array and slowly decrementing it until it first index of the array.

The algorithm will continue to move through the array like this until an entire pass is made without any swaps occurring.

The bubble is sort is basically Quadratic Time: O(n²) when it comes to performance.

class BubbleSort: 
  def __init__(self, arr):
    self.arr = arr;

  def bubbleSort(self):
    count = 0;
    lastIndex = len(self.arr) - 1;
    
    while(count < lastIndex):
      if(self.arr[count] > self.arr[count + 1]):
        self.swap(count)  
      count = count + 1;

      if(count == lastIndex):
        count = 0;
        lastIndex = lastIndex - 1;   

  def swap(self, count):
    temp = self.arr[count];
    self.arr[count] = self.arr[count + 1];
    self.arr[count + 1] = temp;
    
arr = [9, 1, 5, 3, 8, 2]
p1 = BubbleSort(arr)

print(p1.bubbleSort())

回答 20

the-fury和Martin Cote提供的答案解决了无限循环的问题,但是我的代码仍然无法正常工作(对于较大的列表,它将无法正确排序)。我最终放弃了unsorted变量,而是使用了计数器。

def bubble(badList):
    length = len(badList) - 1
    n = 0
    while n < len(badList):
        for element in range(0,length):
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                n = 0
            else:
                n += 1
    return badList

if __name__ == '__main__':
    mylist = [90, 10, 2, 76, 17, 66, 57, 23, 57, 99]
    print bubble(mylist)

如果有人可以在注释中提供任何有关如何改进我的代码的指针,将不胜感激。

Answers provided by the-fury and Martin Cote fixed the problem of the infinite loop, but my code would still not work correctly (for a larger list, it would not sort correctly.). I ended up ditching the unsorted variable and used a counter instead.

def bubble(badList):
    length = len(badList) - 1
    n = 0
    while n < len(badList):
        for element in range(0,length):
            if badList[element] > badList[element + 1]:
                hold = badList[element + 1]
                badList[element + 1] = badList[element]
                badList[element] = hold
                n = 0
            else:
                n += 1
    return badList

if __name__ == '__main__':
    mylist = [90, 10, 2, 76, 17, 66, 57, 23, 57, 99]
    print bubble(mylist)

If anyone could provide any pointers on how to improve my code in the comments, it would be much appreciated.


回答 21

试试这个

a = int(input("Enter Limit"))


val = []

for z in range(0,a):
    b = int(input("Enter Number in List"))
    val.append(b)


for y in range(0,len(val)):
   for x in range(0,len(val)-1):
       if val[x]>val[x+1]:
           t = val[x]
           val[x] = val[x+1]
           val[x+1] = t

print(val)

Try this

a = int(input("Enter Limit"))


val = []

for z in range(0,a):
    b = int(input("Enter Number in List"))
    val.append(b)


for y in range(0,len(val)):
   for x in range(0,len(val)-1):
       if val[x]>val[x+1]:
           t = val[x]
           val[x] = val[x+1]
           val[x+1] = t

print(val)

回答 22

idk如果这可能会在9年后对您有所帮助…它是一个简单的气泡排序程序

    l=[1,6,3,7,5,9,8,2,4,10]

    for i in range(1,len(l)):
        for j in range (i+1,len(l)):
            if l[i]>l[j]:
                l[i],l[j]=l[j],l[i]

idk if this might help you after 9 years… its a simple bubble sort program

    l=[1,6,3,7,5,9,8,2,4,10]

    for i in range(1,len(l)):
        for j in range (i+1,len(l)):
            if l[i]>l[j]:
                l[i],l[j]=l[j],l[i]

回答 23

def merge_bubble(arr):
    k = len(arr)
    while k>2:
        for i in range(0,k-1):
            for j in range(0,k-1):
                if arr[j] > arr[j+1]:
                    arr[j],arr[j+1] = arr[j+1],arr[j]

        return arr
        break
    else:
        if arr[0] > arr[1]:
            arr[0],arr[1] = arr[1],arr[0]
        return arr 
def merge_bubble(arr):
    k = len(arr)
    while k>2:
        for i in range(0,k-1):
            for j in range(0,k-1):
                if arr[j] > arr[j+1]:
                    arr[j],arr[j+1] = arr[j+1],arr[j]

        return arr
        break
    else:
        if arr[0] > arr[1]:
            arr[0],arr[1] = arr[1],arr[0]
        return arr 

回答 24

def bubble_sort(l):
    for i in range(len(l) -1):
        for j in range(len(l)-i-1):
            if l[j] > l[j+1]:
                l[j],l[j+1] = l[j+1], l[j]
    return l
def bubble_sort(l):
    for i in range(len(l) -1):
        for j in range(len(l)-i-1):
            if l[j] > l[j+1]:
                l[j],l[j+1] = l[j+1], l[j]
    return l

回答 25

def bubble_sorted(arr:list):
    while True:
        for i in range(0,len(arr)-1):
            count = 0
            if arr[i] > arr[i+1]:
                count += 1
                arr[i], arr[i+1] = arr[i+1], arr[i]
        if count == 0:
            break
    return arr
arr = [30,20,80,40,50,10,60,70,90]
print(bubble_sorted(arr))
#[20, 30, 40, 50, 10, 60, 70, 80, 90]
def bubble_sorted(arr:list):
    while True:
        for i in range(0,len(arr)-1):
            count = 0
            if arr[i] > arr[i+1]:
                count += 1
                arr[i], arr[i+1] = arr[i+1], arr[i]
        if count == 0:
            break
    return arr
arr = [30,20,80,40,50,10,60,70,90]
print(bubble_sorted(arr))
#[20, 30, 40, 50, 10, 60, 70, 80, 90]

回答 26

def bubbleSort(a): def swap(x, y): temp = a[x] a[x] = a[y] a[y] = temp #outer loop for j in range(len(a)): #slicing to the center, inner loop, python style for i in range(j, len(a) - j):
#find the min index and swap if a[i] < a[j]: swap(j, i) #find the max index and swap if a[i] > a[len(a) - j - 1]: swap(len(a) - j - 1, i) return a

def bubbleSort(a): def swap(x, y): temp = a[x] a[x] = a[y] a[y] = temp #outer loop for j in range(len(a)): #slicing to the center, inner loop, python style for i in range(j, len(a) - j):
#find the min index and swap if a[i] < a[j]: swap(j, i) #find the max index and swap if a[i] > a[len(a) - j - 1]: swap(len(a) - j - 1, i) return a


在球体上平均分配n个点

问题:在球体上平均分配n个点

我需要一种算法,该算法可以使我在球体上的位置保持N个点(可能少于20个),并模糊地将它们分散开。不需要“完美”,但是我只需要它,所以它们都不会聚在一起。

  • 这个问题提供了很好的代码,但是我找不到统一的方法,因为这似乎是100%随机的。
  • 推荐的这篇博客文章有两种方法可以输入球体上的点数,但是Saff和Kuijlaars算法恰好是我可以转录的伪代码,而我发现的代码示例包含“ node [k]”,而我无法看到解释并破坏了这种可能性。第二个博客示例是“黄金分割螺旋”,它给了我奇怪的,成堆的结果,但没有明确的方法来定义恒定半径。
  • 这种算法这个问题好像它可能工作,但我不能拼凑那是什么网页上成伪代码或任何东西。

我遇到的其他一些问题线程涉及随机均匀分布,这增加了我不关心的复杂程度。我很抱歉这是一个愚蠢的问题,但是我想表明我确实看上去很努力,但仍然表现不佳。

因此,我要寻找的是简单的伪代码,以将N个点均匀地分布在一个单位球体上,该点以球坐标或笛卡尔坐标返回。如果它甚至可以通过一点随机分布就更好了(想想围绕一颗恒星的行星,适当散开,但还有回旋余地)。

I need an algorithm that can give me positions around a sphere for N points (less than 20, probably) that vaguely spreads them out. There’s no need for “perfection”, but I just need it so none of them are bunched together.

  • This question provided good code, but I couldn’t find a way to make this uniform, as this seemed 100% randomized.
  • This blog post recommended had two ways allowing input of number of points on the sphere, but the Saff and Kuijlaars algorithm is exactly in psuedocode I could transcribe, and the code example I found contained “node[k]”, which I couldn’t see explained and ruined that possibility. The second blog example was the Golden Section Spiral, which gave me strange, bunched up results, with no clear way to define a constant radius.
  • This algorithm from this question seems like it could possibly work, but I can’t piece together what’s on that page into psuedocode or anything.

A few other question threads I came across spoke of randomized uniform distribution, which adds a level of complexity I’m not concerned about. I apologize that this is such a silly question, but I wanted to show that I’ve truly looked hard and still come up short.

So, what I’m looking for is simple pseudocode to evenly distribute N points around a unit sphere, that either returns in spherical or Cartesian coordinates. Even better if it can even distribute with a bit of randomization (think planets around a star, decently spread out, but with room for leeway).


回答 0

此示例中,代码 node[k]只是第k个节点。您正在生成一个数组N个点,它node[k]是第k个(从0到N-1)。如果这一切使您感到困惑,希望您现在就可以使用它。

(换句话说,k是大小为N的数组,该数组在代码片段开始之前定义,并且包含点列表)。

或者,在此处建立另一个答案(并使用Python):

> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
    lon = 360 * ((x+0.5) / nx)
    for y in range(ny):                                                         
        midpt = (y+0.5) / ny                                                    
        lat = 180 * asin(2*((y+0.5)/ny-0.5))                                    
        print lon,lat                                                           
> python2.7 ll.py                                                      
45.0 -166.91313924                                                              
45.0 -74.0730322921                                                             
45.0 0.0                                                                        
45.0 74.0730322921                                                              
45.0 166.91313924                                                               
135.0 -166.91313924                                                             
135.0 -74.0730322921                                                            
135.0 0.0                                                                       
135.0 74.0730322921                                                             
135.0 166.91313924                                                              
225.0 -166.91313924                                                             
225.0 -74.0730322921                                                            
225.0 0.0                                                                       
225.0 74.0730322921                                                             
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924

如果进行绘制,您会发现两极附近的垂直间距较大,因此每个点都位于大约相同的总面积内空间中(在两极附近,“水平”空间较小,因此“垂直”空间更大) )。

这与所有点到邻居的距离都差不多(这是我认为您的链接所要讨论的)不同,但这可能足以满足您的需求,并且只需制作一个统一的经纬度网格即可进行改进。

In this example code node[k] is just the kth node. You are generating an array N points and node[k] is the kth (from 0 to N-1). If that is all that is confusing you, hopefully you can use that now.

(in other words, k is an array of size N that is defined before the code fragment starts, and which contains a list of the points).

Alternatively, building on the other answer here (and using Python):

> cat ll.py
from math import asin
nx = 4; ny = 5
for x in range(nx):
    lon = 360 * ((x+0.5) / nx)
    for y in range(ny):                                                         
        midpt = (y+0.5) / ny                                                    
        lat = 180 * asin(2*((y+0.5)/ny-0.5))                                    
        print lon,lat                                                           
> python2.7 ll.py                                                      
45.0 -166.91313924                                                              
45.0 -74.0730322921                                                             
45.0 0.0                                                                        
45.0 74.0730322921                                                              
45.0 166.91313924                                                               
135.0 -166.91313924                                                             
135.0 -74.0730322921                                                            
135.0 0.0                                                                       
135.0 74.0730322921                                                             
135.0 166.91313924                                                              
225.0 -166.91313924                                                             
225.0 -74.0730322921                                                            
225.0 0.0                                                                       
225.0 74.0730322921                                                             
225.0 166.91313924
315.0 -166.91313924
315.0 -74.0730322921
315.0 0.0
315.0 74.0730322921
315.0 166.91313924

If you plot that, you’ll see that the vertical spacing is larger near the poles so that each point is situated in about the same total area of space (near the poles there’s less space “horizontally”, so it gives more “vertically”).

This isn’t the same as all points having about the same distance to their neighbours (which is what I think your links are talking about), but it may be sufficient for what you want and improves on simply making a uniform lat/lon grid.


回答 1

斐波那契球算法对此非常有用。它速度快,并且结果一目了然,很容易使人眼蒙蔽。您可以看到一个处理完成的示例,该处理将随着时间的增加显示结果。这是 @gman制作的另一个出色的交互式示例。这是python中的一个简单实现。

import math


def fibonacci_sphere(samples=1):

    points = []
    phi = math.pi * (3. - math.sqrt(5.))  # golden angle in radians

    for i in range(samples):
        y = 1 - (i / float(samples - 1)) * 2  # y goes from 1 to -1
        radius = math.sqrt(1 - y * y)  # radius at y

        theta = phi * i  # golden angle increment

        x = math.cos(theta) * radius
        z = math.sin(theta) * radius

        points.append((x, y, z))

    return points

1000个样本可为您提供:

The Fibonacci sphere algorithm is great for this. It is fast and gives results that at a glance will easily fool the human eye. You can see an example done with processing which will show the result over time as points are added. Here’s another great interactive example made by @gman. And here’s a simple implementation in python.

import math


def fibonacci_sphere(samples=1):

    points = []
    phi = math.pi * (3. - math.sqrt(5.))  # golden angle in radians

    for i in range(samples):
        y = 1 - (i / float(samples - 1)) * 2  # y goes from 1 to -1
        radius = math.sqrt(1 - y * y)  # radius at y

        theta = phi * i  # golden angle increment

        x = math.cos(theta) * radius
        z = math.sin(theta) * radius

        points.append((x, y, z))

    return points

1000 samples gives you this:


回答 2

黄金螺旋法

您说您无法使用金色螺旋方法,但这很遗憾,因为它确实非常好。我想给您一个完整的了解,以便您也许可以理解如何避免这一问题。

因此,这是一种快速,非随机的方式来创建近似正确的晶格。如上所述,没有一个晶格会是完美的,但这可能就足够了。它与其他方法(例如BendWavy.org中的方法)进行了比较,但它的外观漂亮美观,并且可以保证极限间距的均匀性。

底漆:单位盘上的向日葵螺旋

为了理解该算法,我首先邀请您看一下2D向日葵螺旋算法。这是基于这样一个事实,即最不合理的数字是黄金分割率(1 + sqrt(5))/2,如果一个人通过“站在中心,转动整个黄金分割率,然后在那个方向发射另一个点”的方法来发射点,则自然会构造一个螺旋线,随着您获得越来越多的点数,尽管如此,仍然拒绝拥有明确排列的“条形”,使这些点排成一行。(注1)

磁盘上均匀间距的算法是

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

并产生如下结果(n = 100和n = 1000):

径向间隔点

关键的怪事是公式r = sqrt(indices / num_pts); 我怎么来的那个?(笔记2。)

好吧,我在这里使用平方根,因为我希望它们在磁盘周围具有均匀的区域间距。这是相同的话说,在大的限度Ñ我想一点区域ř ∈([R – [R + d – [R ),Θ ∈(θθ + d θ)来包含正比于它的区域的多个点,这是– [R d – [R d θ。现在,如果我们假装我们在这里谈论一个随机变量,这可以直接解释为说(RΘ)仅为cr对于某些常数的联合概率密度 c。然后在单位磁盘上进行归一化将迫使c = 1 /π。

现在让我介绍一个技巧。它来自概率论在那里它被称为采样逆CDF:假设你想生成的概率密度的随机变量˚FZ ^),你有一个随机变量û〜制服(0,1),就像来的出random()在大多数编程语言中。你怎么做到这一点?

  1. 首先,将您的密度转换为累积分布函数或CDF,我们将其称为Fz)。请记住,CDF随导数fz)。
  2. 然后计算CDF的反函数F -1z)。
  3. 您会发现Z = F -1U)根据目标密度分布。(注3)。

现在,黄金比例螺旋技巧将点以θ的均匀分布方式隔开,因此我们将其积分;对于单位磁盘,我们剩下Fr)= r 2。因此反函数为F -1u)= u 1/2,因此我们将在磁盘上生成极坐标为的随机点r = sqrt(random()); theta = 2 * pi * random()

现在,我们不再对这个反函数进行随机采样,而是对其进行均匀采样,而关于均匀采样的好处是,关于点如何在大N的限制内扩展的结果将表现得就像我们对随机函数采样一样。这种组合是诀窍。而不是random()使用(arange(0, num_pts, dtype=float) + 0.5)/num_pts,也就是说,如果我们要采样10个点,则为r = 0.05, 0.15, 0.25, ... 0.95。我们统一采样r以获得相等的区域间距,并使用向日葵增量来避免输出中的点“棒”。

现在在球上做向日葵

我们需要对球点进行点更改仅涉及将极坐标切换为球坐标。径向坐标当然不会进入此范围,因为我们位于单位球体上。为了让事情变得更加一致这里,尽管我是一位训练有素的物理学家,我会用数学家的坐标,其中0≤ φ ≤π就是北纬从极点和0≤下来θ ≤2π东经。因此与上面的区别在于,我们基本上是用φ代替变量r

我们的区域元素,这是[R d [R d θ,现在变成了没有,备受更复杂的罪孽(φ)d φ d θ。因此,我们对统一的间距联合密度是罪(φ)/4π。积分出θ,我们发现˚Fφ)= SIN(φ)/ 2,从而˚Fφ)=(1 – COS(φ))/ 2。反相此我们可以看到,一个均匀随机变量看起来像ACOS(1 – 2 ü),但我们采样均匀,而不是随机的,所以我们改为使用φ ķ = ACOS(1 – 2( ķ+ 0.5)/ N)。算法的其余部分只是将其投影到x,y和z坐标上:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()

再次对于n = 100和n = 1000,结果如下所示:

进一步的研究

我想大呼马丁·罗伯茨(Martin Roberts)的博客。请注意,上面我通过向每个索引添加0.5来创建索引的偏移量。这只是视觉上吸引我,但是事实证明,偏移量的选择很重要,并且在整个间隔内不是恒定不变的,并且如果选择正确,可能意味着包装精度提高了8%。还应该有一种方法可以使他的R 2序列覆盖一个球体,很有趣的是,看看它是否也产生了很好的均匀覆盖,也许是原样,但也许只需要从一半单位正方形沿对角线方向左右切开,然后拉伸得到一个圆。

笔记

  1. 这些“条”是由对数字的有理逼近形成的,而对数字的最佳有理逼近来自其连续的分数表达式,z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...)))其中z是整数,并且n_1, n_2, n_3, ...是正整数的有限或无限序列:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)

    由于分数部分1/(...)始终在零和一之间,因此连续分数中的大整数可以提供特别好的有理近似值:“一个除以100和101之间的值”比“一个除以1-2之间的值”要好。因此,最不合理的数字是一个,1 + 1/(1 + 1/(1 + ...))并且没有特别好的有理近似值。通过乘以φ可以得出黄金分割率的公式,从而可以求解φ = 1 + 1 / φ

  2. 对于不太熟悉NumPy的人们-所有功能都是“矢量化的”,因此sqrt(array)与其他语言可能会写的相同map(sqrt, array)。因此,这是一个逐个组件的sqrt应用程序。标量除法或标量加法也是如此-并行适用于所有组件。

  3. 一旦您知道这是结果,证明就很简单。如果您问z < Z < z + d z的概率是什么,这与问z < F -1U)< z + d z的概率是什么,将F应用于所有三个表达式表示它是一个单调递增的函数,因此Fz)< U < Fz + d z),向外扩展右侧以找到Fz)+ fz)d z,并且由于U是均匀的,因此如所承诺的,该概率仅为fz)d z

The golden spiral method

You said you couldn’t get the golden spiral method to work and that’s a shame because it’s really, really good. I would like to give you a complete understanding of it so that maybe you can understand how to keep this away from being “bunched up.”

So here’s a fast, non-random way to create a lattice that is approximately correct; as discussed above, no lattice will be perfect, but this may be good enough. It is compared to other methods e.g. at BendWavy.org but it just has a nice and pretty look as well as a guarantee about even spacing in the limit.

Primer: sunflower spirals on the unit disk

To understand this algorithm, I first invite you to look at the 2D sunflower spiral algorithm. This is based on the fact that the most irrational number is the golden ratio (1 + sqrt(5))/2 and if one emits points by the approach “stand at the center, turn a golden ratio of whole turns, then emit another point in that direction,” one naturally constructs a spiral which, as you get to higher and higher numbers of points, nevertheless refuses to have well-defined ‘bars’ that the points line up on.(Note 1.)

The algorithm for even spacing on a disk is,

from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as pp

num_pts = 100
indices = arange(0, num_pts, dtype=float) + 0.5

r = sqrt(indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

pp.scatter(r*cos(theta), r*sin(theta))
pp.show()

and it produces results that look like (n=100 and n=1000):

Spacing the points radially

The key strange thing is the formula r = sqrt(indices / num_pts); how did I come to that one? (Note 2.)

Well, I am using the square root here because I want these to have even-area spacing around the disk. That is the same as saying that in the limit of large N I want a little region R ∈ (r, r + dr), Θ ∈ (θ, θ + dθ) to contain a number of points proportional to its area, which is r dr dθ. Now if we pretend that we are talking about a random variable here, this has a straightforward interpretation as saying that the joint probability density for (R, Θ) is just c r for some constant c. Normalization on the unit disk would then force c = 1/π.

Now let me introduce a trick. It comes from probability theory where it’s known as sampling the inverse CDF: suppose you wanted to generate a random variable with a probability density f(z) and you have a random variable U ~ Uniform(0, 1), just like comes out of random() in most programming languages. How do you do this?

  1. First, turn your density into a cumulative distribution function or CDF, which we will call F(z). A CDF, remember, increases monotonically from 0 to 1 with derivative f(z).
  2. Then calculate the CDF’s inverse function F-1(z).
  3. You will find that Z = F-1(U) is distributed according to the target density. (Note 3).

Now the golden-ratio spiral trick spaces the points out in a nicely even pattern for θ so let’s integrate that out; for the unit disk we are left with F(r) = r2. So the inverse function is F-1(u) = u1/2, and therefore we would generate random points on the disk in polar coordinates with r = sqrt(random()); theta = 2 * pi * random().

Now instead of randomly sampling this inverse function we’re uniformly sampling it, and the nice thing about uniform sampling is that our results about how points are spread out in the limit of large N will behave as if we had randomly sampled it. This combination is the trick. Instead of random() we use (arange(0, num_pts, dtype=float) + 0.5)/num_pts, so that, say, if we want to sample 10 points they are r = 0.05, 0.15, 0.25, ... 0.95. We uniformly sample r to get equal-area spacing, and we use the sunflower increment to avoid awful “bars” of points in the output.

Now doing the sunflower on a sphere

The changes that we need to make to dot the sphere with points merely involve switching out the polar coordinates for spherical coordinates. The radial coordinate of course doesn’t enter into this because we’re on a unit sphere. To keep things a little more consistent here, even though I was trained as a physicist I’ll use mathematicians’ coordinates where 0 ≤ φ ≤ π is latitude coming down from the pole and 0 ≤ θ ≤ 2π is longitude. So the difference from above is that we are basically replacing the variable r with φ.

Our area element, which was r dr dθ, now becomes the not-much-more-complicated sin(φ) dφ dθ. So our joint density for uniform spacing is sin(φ)/4π. Integrating out θ, we find f(φ) = sin(φ)/2, thus F(φ) = (1 − cos(φ))/2. Inverting this we can see that a uniform random variable would look like acos(1 – 2 u), but we sample uniformly instead of randomly, so we instead use φk = acos(1 − 2 (k + 0.5)/N). And the rest of the algorithm is just projecting this onto the x, y, and z coordinates:

from numpy import pi, cos, sin, arccos, arange
import mpl_toolkits.mplot3d
import matplotlib.pyplot as pp

num_pts = 1000
indices = arange(0, num_pts, dtype=float) + 0.5

phi = arccos(1 - 2*indices/num_pts)
theta = pi * (1 + 5**0.5) * indices

x, y, z = cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi);

pp.figure().add_subplot(111, projection='3d').scatter(x, y, z);
pp.show()

Again for n=100 and n=1000 the results look like:

Further research

I wanted to give a shout out to Martin Roberts’s blog. Note that above I created an offset of my indices by adding 0.5 to each index. This was just visually appealing to me, but it turns out that the choice of offset matters a lot and is not constant over the interval and can mean getting as much as 8% better accuracy in packing if chosen correctly. There should also be a way to get his R2 sequence to cover a sphere and it would be interesting to see if this also produced a nice even covering, perhaps as-is but perhaps needing to be, say, taken from only a half of the unit square cut diagonally or so and stretched around to get a circle.

Notes

  1. Those “bars” are formed by rational approximations to a number, and the best rational approximations to a number come from its continued fraction expression, z + 1/(n_1 + 1/(n_2 + 1/(n_3 + ...))) where z is an integer and n_1, n_2, n_3, ... is either a finite or infinite sequence of positive integers:

    def continued_fraction(r):
        while r != 0:
            n = floor(r)
            yield n
            r = 1/(r - n)
    

    Since the fraction part 1/(...) is always between zero and one, a large integer in the continued fraction allows for a particularly good rational approximation: “one divided by something between 100 and 101” is better than “one divided by something between 1 and 2.” The most irrational number is therefore the one which is 1 + 1/(1 + 1/(1 + ...)) and has no particularly good rational approximations; one can solve φ = 1 + 1/φ by multiplying through by φ to get the formula for the golden ratio.

  2. For folks who are not so familiar with NumPy — all of the functions are “vectorized,” so that sqrt(array) is the same as what other languages might write map(sqrt, array). So this is a component-by-component sqrt application. The same also holds for division by a scalar or addition with scalars — those apply to all components in parallel.

  3. The proof is simple once you know that this is the result. If you ask what’s the probability that z < Z < z + dz, this is the same as asking what’s the probability that z < F-1(U) < z + dz, apply F to all three expressions noting that it is a monotonically increasing function, hence F(z) < U < F(z + dz), expand the right hand side out to find F(z) + f(z) dz, and since U is uniform this probability is just f(z) dz as promised.


回答 3

这被称为球体上的堆积点,并且没有(已知)一般的完美解决方案。但是,有许多不完善的解决方案。最受欢迎的三个似乎是:

  1. 创建一个模拟。将每个点视为约束在球体上的电子,然后运行一定数量的步骤进行仿真。电子的排斥力自然会使系统趋于更稳定的状态,在这些状态下,点之间的距离尽可能远。
  2. 超立方体排斥。这种花哨的方法实际上非常简单:您可以在围绕球体的立方体内部统一选择点(远远超过n它们),然后拒绝球体外部的点。将其余点视为向量,并将其标准化。这些是您的“样本”- n使用某种方法(随机,贪婪等)选择它们。
  3. 螺旋近似。您围绕球体跟踪螺旋,并在螺旋周围均匀分布点。由于涉及数学,因此与模拟相比,它们的理解更为复杂,但速度更快(并且可能涉及的代码更少)。最受欢迎的似乎是Saff等人

一个很多关于这个问题的更多信息,可以发现这里

This is known as packing points on a sphere, and there is no (known) general, perfect solution. However, there are plenty of imperfect solutions. The three most popular seem to be:

  1. Create a simulation. Treat each point as an electron constrained to a sphere, then run a simulation for a certain number of steps. The electrons’ repulsion will naturally tend the system to a more stable state, where the points are about as far away from each other as they can get.
  2. Hypercube rejection. This fancy-sounding method is actually really simple: you uniformly choose points (much more than n of them) inside of the cube surrounding the sphere, then reject the points outside of the sphere. Treat the remaining points as vectors, and normalize them. These are your “samples” – choose n of them using some method (randomly, greedy, etc).
  3. Spiral approximations. You trace a spiral around a sphere, and evenly-distribute the points around the spiral. Because of the mathematics involved, these are more complicated to understand than the simulation, but much faster (and probably involving less code). The most popular seems to be by Saff, et al.

A lot more information about this problem can be found here


回答 4

您要寻找的是球形覆盖物。球形覆盖问题非常棘手,除了少数点之外,其他解决方案都是未知的。可以肯定知道的一件事是,给定一个球体上的n个点,总是存在两个距离d = (4-csc^2(\pi n/6(n-2)))^(1/2)或更近的点。

如果您想要一种概率方法来生成均匀分布在球体上的点,则很简单:通过高斯分布在空间中均匀生成点(它内置于Java中,不难找到其他语言的代码)。因此,在3维空间中,您需要

Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };

然后通过将其与原点的距离归一化来将点投影到球体上

double norm = Math.sqrt( (p[0])^2 + (p[1])^2 + (p[2])^2 ); 
double[] sphereRandomPoint = { p[0]/norm, p[1]/norm, p[2]/norm };

n维上的高斯分布是球对称的,因此到球上的投影是均匀的。

当然,不能保证在统一生成的点的集合中任意两个点之间的距离都将限制在下面,因此您可以使用拒绝来强制执行您可能具有的任何此类条件:可能最好先生成整个集合,然后再生成如有必要,拒绝整个收藏。(或者使用“早期拒绝”来拒绝您到目前为止生成的整个集合;只是不要保留某些要点,而要丢弃其他要点。)您可以使用d上面给出的公式减去一些懈怠来确定之间的最小距离点以下,您将拒绝一组点。您必须计算n选择2个距离,拒绝的概率取决于松弛度;很难说是怎么回事,所以运行模拟以了解相关的统计信息。

What you are looking for is called a spherical covering. The spherical covering problem is very hard and solutions are unknown except for small numbers of points. One thing that is known for sure is that given n points on a sphere, there always exist two points of distance d = (4-csc^2(\pi n/6(n-2)))^(1/2) or closer.

If you want a probabilistic method for generating points uniformly distributed on a sphere, it’s easy: generate points in space uniformly by Gaussian distribution (it’s built into Java, not hard to find the code for other languages). So in 3-dimensional space, you need something like

Random r = new Random();
double[] p = { r.nextGaussian(), r.nextGaussian(), r.nextGaussian() };

Then project the point onto the sphere by normalizing its distance from the origin

double norm = Math.sqrt( (p[0])^2 + (p[1])^2 + (p[2])^2 ); 
double[] sphereRandomPoint = { p[0]/norm, p[1]/norm, p[2]/norm };

The Gaussian distribution in n dimensions is spherically symmetric so the projection onto the sphere is uniform.

Of course, there’s no guarantee that the distance between any two points in a collection of uniformly generated points will be bounded below, so you can use rejection to enforce any such conditions that you might have: probably it’s best to generate the whole collection and then reject the whole collection if necessary. (Or use “early rejection” to reject the whole collection you’ve generated so far; just don’t keep some points and drop others.) You can use the formula for d given above, minus some slack, to determine the min distance between points below which you will reject a set of points. You’ll have to calculate n choose 2 distances, and the probability of rejection will depend on the slack; it’s hard to say how, so run a simulation to get a feel for the relevant statistics.


回答 5

该答案基于该答案很好概述的相同“理论”

我将这个答案添加为: -此外,很难“摸索”如何在没有图像的情况下区分其他选项,因此,这是此选项的外观(如下)以及可立即运行的实现随之而来。
-其他选项均不能满足“均匀性”需求(即显然不是这样)。(注意在原始问题中特别希望获得类似行星分布的行为,您只是从有限的列表中随机拒绝了k个均匀创建的点(随机返回k个项目中的索引计数)。)
-最接近其他暗示迫使您通过“角轴”来决定“ N”,而跨两个角轴值仅是“ N的一个值”(在N较小的情况下,要知道可能会是什么还是可能不会很重要(例如,您想获得“ 5分”-尽享乐趣))

N为20时:


然后N在80:


这是现成的python3代码,其仿真是相同的来源:“ http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ” 。(我包含的绘图在以“ main”运行时会触发,取自:http : //www.scipy.org/Cookbook/Matplotlib/mplot3D

from math import cos, sin, pi, sqrt

def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
    """ each point you get will be of form 'x, y, z'; in cartesian coordinates
        eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0 
        ------------
        converted from:  http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ) 
    """
    dlong = pi*(3.0-sqrt(5.0))  # ~2.39996323 
    dz   =  2.0/numberOfPoints
    long =  0.0
    z    =  1.0 - dz/2.0
    ptsOnSphere =[]
    for k in range( 0, numberOfPoints): 
        r    = sqrt(1.0-z*z)
        ptNew = (cos(long)*r, sin(long)*r, z)
        ptsOnSphere.append( ptNew )
        z    = z - dz
        long = long + dlong
    return ptsOnSphere

if __name__ == '__main__':                
    ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)    

    #toggle True/False to print them
    if( True ):    
        for pt in ptsOnSphere:  print( pt)

    #toggle True/False to plot them
    if(True):
        from numpy import *
        import pylab as p
        import mpl_toolkits.mplot3d.axes3d as p3

        fig=p.figure()
        ax = p3.Axes3D(fig)

        x_s=[];y_s=[]; z_s=[]

        for pt in ptsOnSphere:
            x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])

        ax.scatter3D( array( x_s), array( y_s), array( z_s) )                
        ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
        p.show()
        #end

经过低计数测试(N以2、5、7、13等表示),并且看起来“不错”

This answer is based on the same ‘theory’ that is outlined well by this answer

I’m adding this answer as:
— None of the other options fit the ‘uniformity’ need ‘spot-on’ (or not obviously-clearly so). (Noting to get the planet like distribution looking behavior particurally wanted in the original ask, you just reject from the finite list of the k uniformly created points at random (random wrt the index count in the k items back).)
–The closest other impl forced you to decide the ‘N’ by ‘angular axis’, vs. just ‘one value of N’ across both angular axis values ( which at low counts of N is very tricky to know what may, or may not matter (e.g. you want ‘5’ points — have fun ) )
–Furthermore, it’s very hard to ‘grok’ how to differentiate between the other options without any imagery, so here’s what this option looks like (below), and the ready-to-run implementation that goes with it.

with N at 20:


and then N at 80:


here’s the ready-to-run python3 code, where the emulation is that same source: ” http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ” found by others. ( The plotting I’ve included, that fires when run as ‘main,’ is taken from: http://www.scipy.org/Cookbook/Matplotlib/mplot3D )

from math import cos, sin, pi, sqrt

def GetPointsEquiAngularlyDistancedOnSphere(numberOfPoints=45):
    """ each point you get will be of form 'x, y, z'; in cartesian coordinates
        eg. the 'l2 distance' from the origion [0., 0., 0.] for each point will be 1.0 
        ------------
        converted from:  http://web.archive.org/web/20120421191837/http://www.cgafaq.info/wiki/Evenly_distributed_points_on_sphere ) 
    """
    dlong = pi*(3.0-sqrt(5.0))  # ~2.39996323 
    dz   =  2.0/numberOfPoints
    long =  0.0
    z    =  1.0 - dz/2.0
    ptsOnSphere =[]
    for k in range( 0, numberOfPoints): 
        r    = sqrt(1.0-z*z)
        ptNew = (cos(long)*r, sin(long)*r, z)
        ptsOnSphere.append( ptNew )
        z    = z - dz
        long = long + dlong
    return ptsOnSphere

if __name__ == '__main__':                
    ptsOnSphere = GetPointsEquiAngularlyDistancedOnSphere( 80)    

    #toggle True/False to print them
    if( True ):    
        for pt in ptsOnSphere:  print( pt)

    #toggle True/False to plot them
    if(True):
        from numpy import *
        import pylab as p
        import mpl_toolkits.mplot3d.axes3d as p3

        fig=p.figure()
        ax = p3.Axes3D(fig)

        x_s=[];y_s=[]; z_s=[]

        for pt in ptsOnSphere:
            x_s.append( pt[0]); y_s.append( pt[1]); z_s.append( pt[2])

        ax.scatter3D( array( x_s), array( y_s), array( z_s) )                
        ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
        p.show()
        #end

tested at low counts (N in 2, 5, 7, 13, etc) and seems to work ‘nice’


回答 6

尝试:

function sphere ( N:float,k:int):Vector3 {
    var inc =  Mathf.PI  * (3 - Mathf.Sqrt(5));
    var off = 2 / N;
    var y = k * off - 1 + (off / 2);
    var r = Mathf.Sqrt(1 - y*y);
    var phi = k * inc;
    return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r); 
};

上面的函数应该以N个循环总数和k个循环电流迭代的形式循环运行。

它基于向日葵种子模式,除了将向日葵种子弯曲成半个圆顶,再弯曲成一个球体。

这是一张照片,除了我将相机放置在球体内的一半位置之外,所以看起来是2d而不是3d,因为相机到所有点的距离都相同。 http://3.bp.blogspot.com/-9lbPHLccQHA/USXf88_bvVI/AAAAAAAAADY/j7qhQsSZsA8/s640/sphere.jpg

Try:

function sphere ( N:float,k:int):Vector3 {
    var inc =  Mathf.PI  * (3 - Mathf.Sqrt(5));
    var off = 2 / N;
    var y = k * off - 1 + (off / 2);
    var r = Mathf.Sqrt(1 - y*y);
    var phi = k * inc;
    return Vector3((Mathf.Cos(phi)*r), y, Mathf.Sin(phi)*r); 
};

The above function should run in loop with N loop total and k loop current iteration.

It is based on a sunflower seeds pattern, except the sunflower seeds are curved around into a half dome, and again into a sphere.

Here is a picture, except I put the camera half way inside the sphere so it looks 2d instead of 3d because the camera is same distance from all points. http://3.bp.blogspot.com/-9lbPHLccQHA/USXf88_bvVI/AAAAAAAAADY/j7qhQsSZsA8/s640/sphere.jpg


回答 7

Healpix解决了一个密切相关的问题(用相等面积的像素对球体进行像素化):

http://healpix.sourceforge.net/

这可能是过大了,但是也许看了之后,您就会意识到它的其他一些不错的特性对您很有趣。它不仅仅是输出点云的函数。

我降落在这里试图再次找到它。名称“ healpix”并不完全引起球体…

Healpix solves a closely related problem (pixelating the sphere with equal area pixels):

http://healpix.sourceforge.net/

It’s probably overkill, but maybe after looking at it you’ll realize some of it’s other nice properties are interesting to you. It’s way more than just a function that outputs a point cloud.

I landed here trying to find it again; the name “healpix” doesn’t exactly evoke spheres…


回答 8

仅需少量点就可以运行模拟:

from random import random,randint
r = 10
n = 20
best_closest_d = 0
best_points = []
points = [(r,0,0) for i in range(n)]
for simulation in range(10000):
    x = random()*r
    y = random()*r
    z = r-(x**2+y**2)**0.5
    if randint(0,1):
        x = -x
    if randint(0,1):
        y = -y
    if randint(0,1):
        z = -z
    closest_dist = (2*r)**2
    closest_index = None
    for i in range(n):
        for j in range(n):
            if i==j:
                continue
            p1,p2 = points[i],points[j]
            x1,y1,z1 = p1
            x2,y2,z2 = p2
            d = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2
            if d < closest_dist:
                closest_dist = d
                closest_index = i
    if simulation % 100 == 0:
        print simulation,closest_dist
    if closest_dist > best_closest_d:
        best_closest_d = closest_dist
        best_points = points[:]
    points[closest_index]=(x,y,z)


print best_points
>>> best_points
[(9.921692138442777, -9.930808529773849, 4.037839326088124),
 (5.141893371460546, 1.7274947332807744, -4.575674650522637),
 (-4.917695758662436, -1.090127967097737, -4.9629263893193745),
 (3.6164803265540666, 7.004158551438312, -2.1172868271109184),
 (-9.550655088997003, -9.580386054762917, 3.5277052594769422),
 (-0.062238110294250415, 6.803105171979587, 3.1966101417463655),
 (-9.600996012203195, 9.488067284474834, -3.498242301168819),
 (-8.601522086624803, 4.519484132245867, -0.2834204048792728),
 (-1.1198210500791472, -2.2916581379035694, 7.44937337008726),
 (7.981831370440529, 8.539378431788634, 1.6889099589074377),
 (0.513546008372332, -2.974333486904779, -6.981657873262494),
 (-4.13615438946178, -6.707488383678717, 2.1197605651446807),
 (2.2859494919024326, -8.14336582650039, 1.5418694699275672),
 (-7.241410895247996, 9.907335206038226, 2.271647103735541),
 (-9.433349952523232, -7.999106443463781, -2.3682575660694347),
 (3.704772125650199, 1.0526567864085812, 6.148581714099761),
 (-3.5710511242327048, 5.512552040316693, -3.4318468250897647),
 (-7.483466337225052, -1.506434920354559, 2.36641535124918),
 (7.73363824231576, -8.460241422163824, -1.4623228616326003),
 (10, 0, 0)]

with small numbers of points you could run a simulation:

from random import random,randint
r = 10
n = 20
best_closest_d = 0
best_points = []
points = [(r,0,0) for i in range(n)]
for simulation in range(10000):
    x = random()*r
    y = random()*r
    z = r-(x**2+y**2)**0.5
    if randint(0,1):
        x = -x
    if randint(0,1):
        y = -y
    if randint(0,1):
        z = -z
    closest_dist = (2*r)**2
    closest_index = None
    for i in range(n):
        for j in range(n):
            if i==j:
                continue
            p1,p2 = points[i],points[j]
            x1,y1,z1 = p1
            x2,y2,z2 = p2
            d = (x1-x2)**2+(y1-y2)**2+(z1-z2)**2
            if d < closest_dist:
                closest_dist = d
                closest_index = i
    if simulation % 100 == 0:
        print simulation,closest_dist
    if closest_dist > best_closest_d:
        best_closest_d = closest_dist
        best_points = points[:]
    points[closest_index]=(x,y,z)


print best_points
>>> best_points
[(9.921692138442777, -9.930808529773849, 4.037839326088124),
 (5.141893371460546, 1.7274947332807744, -4.575674650522637),
 (-4.917695758662436, -1.090127967097737, -4.9629263893193745),
 (3.6164803265540666, 7.004158551438312, -2.1172868271109184),
 (-9.550655088997003, -9.580386054762917, 3.5277052594769422),
 (-0.062238110294250415, 6.803105171979587, 3.1966101417463655),
 (-9.600996012203195, 9.488067284474834, -3.498242301168819),
 (-8.601522086624803, 4.519484132245867, -0.2834204048792728),
 (-1.1198210500791472, -2.2916581379035694, 7.44937337008726),
 (7.981831370440529, 8.539378431788634, 1.6889099589074377),
 (0.513546008372332, -2.974333486904779, -6.981657873262494),
 (-4.13615438946178, -6.707488383678717, 2.1197605651446807),
 (2.2859494919024326, -8.14336582650039, 1.5418694699275672),
 (-7.241410895247996, 9.907335206038226, 2.271647103735541),
 (-9.433349952523232, -7.999106443463781, -2.3682575660694347),
 (3.704772125650199, 1.0526567864085812, 6.148581714099761),
 (-3.5710511242327048, 5.512552040316693, -3.4318468250897647),
 (-7.483466337225052, -1.506434920354559, 2.36641535124918),
 (7.73363824231576, -8.460241422163824, -1.4623228616326003),
 (10, 0, 0)]

回答 9

以您的两个最大因素为准N,如果N==20这两个最大因素是{5,4}或更普遍的话{a,b}。计算

dlat  = 180/(a+1)
dlong = 360/(b+1})

把你的第一个点{90-dlat/2,(dlong/2)-180},第二个在{90-dlat/2,(3*dlong/2)-180},你在第3次{90-dlat/2,(5*dlong/2)-180},直到你绊倒环游世界一次,此时你一定要了解{75,150},当你去旁边{90-3*dlat/2,(dlong/2)-180}

显然,我正在按球形地球表面上的度数进行此操作,使用了将+/-转换为N / S或E / W的常规约定。显然,这给了您一个完全非随机的分布,但是它是均匀的,并且这些点不会聚集在一起。

要增加一定程度的随机性,您可以生成2个正态分布(均值0和std dev分别为{dlat / 3,dlong / 3})并将它们添加到均匀分布的点上。

Take the two largest factors of your N, if N==20 then the two largest factors are {5,4}, or, more generally {a,b}. Calculate

dlat  = 180/(a+1)
dlong = 360/(b+1})

Put your first point at {90-dlat/2,(dlong/2)-180}, your second at {90-dlat/2,(3*dlong/2)-180}, your 3rd at {90-dlat/2,(5*dlong/2)-180}, until you’ve tripped round the world once, by which time you’ve got to about {75,150} when you go next to {90-3*dlat/2,(dlong/2)-180}.

Obviously I’m working this in degrees on the surface of the spherical earth, with the usual conventions for translating +/- to N/S or E/W. And obviously this gives you a completely non-random distribution, but it is uniform and the points are not bunched together.

To add some degree of randomness, you could generate 2 normally-distributed (with mean 0 and std dev of {dlat/3, dlong/3} as appropriate) and add them to your uniformly distributed points.


回答 10

编辑:这不能回答OP想要问的问题,请留在这里,以防人们发现它有用。

我们使用概率的乘法规则,并结合无穷小。这将导致两行代码来实现所需的结果:

longitude: φ = uniform([0,2pi))
azimuth:   θ = -arcsin(1 - 2*uniform([0,1]))

(在以下坐标系中定义:)

您的语言通常具有统一的随机数基元。例如,在python中,您可以使用random.random()返回范围内的数字[0,1)。您可以将此数字乘以k以得到范围内的随机数[0,k)。因此在python中,uniform([0,2pi))将表示random.random()*2*math.pi


证明

现在我们不能均匀地分配θ,否则我们将陷入困境。我们希望分配与球面楔形的表面积成比例的概率(此图中的θ实际上为φ):

在赤道的角位移dφ将导致dφ* r的位移。在任意方位角θ处的位移将是什么?好吧,距z轴的半径为r*sin(θ),因此与楔形相交的“纬度”的弧长为dφ * r*sin(θ)。因此我们计算累积分布,通过对从南极到北极的切片面积进行积分,要采样的区域。

(其中stuff = dφ*r

现在,我们将尝试从中获取CDF的逆样本:http : //en.wikipedia.org/wiki/Inverse_transform_sampling

首先,我们将几乎CDF除以最大值进行归一化。这具有抵消dφ和r的副作用。

azimuthalCDF: cumProb = (sin(θ)+1)/2 from -pi/2 to pi/2

inverseCDF: θ = -sin^(-1)(1 - 2*cumProb)

从而:

let x by a random float in range [0,1]
θ = -arcsin(1-2*x)

edit: This does not answer the question the OP meant to ask, leaving it here in case people find it useful somehow.

We use the multiplication rule of probability, combined with infinitessimals. This results in 2 lines of code to achieve your desired result:

longitude: φ = uniform([0,2pi))
azimuth:   θ = -arcsin(1 - 2*uniform([0,1]))

(defined in the following coordinate system:)

Your language typically has a uniform random number primitive. For example in python you can use random.random() to return a number in the range [0,1). You can multiply this number by k to get a random number in the range [0,k). Thus in python, uniform([0,2pi)) would mean random.random()*2*math.pi.


Proof

Now we can’t assign θ uniformly, otherwise we’d get clumping at the poles. We wish to assign probabilities proportional to the surface area of the spherical wedge (the θ in this diagram is actually φ):

An angular displacement dφ at the equator will result in a displacement of dφ*r. What will that displacement be at an arbitrary azimuth θ? Well, the radius from the z-axis is r*sin(θ), so the arclength of that “latitude” intersecting the wedge is dφ * r*sin(θ). Thus we calculate the cumulative distribution of the area to sample from it, by integrating the area of the slice from the south pole to the north pole.

(where stuff=dφ*r)

We will now attempt to get the inverse of the CDF to sample from it: http://en.wikipedia.org/wiki/Inverse_transform_sampling

First we normalize by dividing our almost-CDF by its maximum value. This has the side-effect of cancelling out the dφ and r.

azimuthalCDF: cumProb = (sin(θ)+1)/2 from -pi/2 to pi/2

inverseCDF: θ = -sin^(-1)(1 - 2*cumProb)

Thus:

let x by a random float in range [0,1]
θ = -arcsin(1-2*x)

回答 11

或…放置20个点,计算二十面体面的中心。对于12点,找到二十面体的顶点。对于30点,是二十面体边缘的中点。您可以对四面体,立方体,十二面体和八面体执行相同的操作:一组点位于顶点上,另一组点位于面的中心,另一组点位于边的中心。但是,不能将它们混合在一起。

OR… to place 20 points, compute the centers of the icosahedronal faces. For 12 points, find the vertices of the icosahedron. For 30 points, the mid point of the edges of the icosahedron. you can do the same thing with the tetrahedron, cube, dodecahedron and octahedrons: one set of points is on the vertices, another on the center of the face and another on the center of the edges. They cannot be mixed, however.


回答 12

# create uniform spiral grid
numOfPoints = varargin[0]
vxyz = zeros((numOfPoints,3),dtype=float)
sq0 = 0.00033333333**2
sq2 = 0.9999998**2
sumsq = 2*sq0 + sq2
vxyz[numOfPoints -1] = array([(sqrt(sq0/sumsq)), 
                              (sqrt(sq0/sumsq)), 
                              (-sqrt(sq2/sumsq))])
vxyz[0] = -vxyz[numOfPoints -1] 
phi2 = sqrt(5)*0.5 + 2.5
rootCnt = sqrt(numOfPoints)
prevLongitude = 0
for index in arange(1, (numOfPoints -1), 1, dtype=float):
  zInc = (2*index)/(numOfPoints) -1
  radius = sqrt(1-zInc**2)

  longitude = phi2/(rootCnt*radius)
  longitude = longitude + prevLongitude
  while (longitude > 2*pi): 
    longitude = longitude - 2*pi

  prevLongitude = longitude
  if (longitude > pi):
    longitude = longitude - 2*pi

  latitude = arccos(zInc) - pi/2
  vxyz[index] = array([ (cos(latitude) * cos(longitude)) ,
                        (cos(latitude) * sin(longitude)), 
                        sin(latitude)])
# create uniform spiral grid
numOfPoints = varargin[0]
vxyz = zeros((numOfPoints,3),dtype=float)
sq0 = 0.00033333333**2
sq2 = 0.9999998**2
sumsq = 2*sq0 + sq2
vxyz[numOfPoints -1] = array([(sqrt(sq0/sumsq)), 
                              (sqrt(sq0/sumsq)), 
                              (-sqrt(sq2/sumsq))])
vxyz[0] = -vxyz[numOfPoints -1] 
phi2 = sqrt(5)*0.5 + 2.5
rootCnt = sqrt(numOfPoints)
prevLongitude = 0
for index in arange(1, (numOfPoints -1), 1, dtype=float):
  zInc = (2*index)/(numOfPoints) -1
  radius = sqrt(1-zInc**2)

  longitude = phi2/(rootCnt*radius)
  longitude = longitude + prevLongitude
  while (longitude > 2*pi): 
    longitude = longitude - 2*pi

  prevLongitude = longitude
  if (longitude > pi):
    longitude = longitude - 2*pi

  latitude = arccos(zInc) - pi/2
  vxyz[index] = array([ (cos(latitude) * cos(longitude)) ,
                        (cos(latitude) * sin(longitude)), 
                        sin(latitude)])

回答 13

@robert king这是一个非常不错的解决方案,但其中包含一些草率的错误。我知道它对我有很大帮助,所以不要介意草率。:)这是一个清理的版本。

from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2

lat_dist = lambda lat, R=1.0: R*(1-sin(lat))

#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
    if -halfpi > lat or lat > halfpi:
        raise ValueError("lat must be between -halfpi and halfpi")
    return 2 * pi * R ** 2 * (1-sin(lat))

sphere_lonarea = lambda lon, R=1.0: \
        4 * pi * R ** 2 * lon / twopi

#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
#    = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
        (sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi


def test_sphere(n_lats=10, n_lons=19, radius=540.0):
    total_area = 0.0
    for i_lons in range(n_lons):
        lon0 = twopi * float(i_lons) / n_lons
        lon1 = twopi * float(i_lons+1) / n_lons
        for i_lats in range(n_lats):
            lat0 = asin(2 * float(i_lats) / n_lats - 1)
            lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
            area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
            print("{:} {:}: {:9.4f} to  {:9.4f}, {:9.4f} to  {:9.4f} => area {:10.4f}"
                    .format(i_lats, i_lons
                    , degrees(lat0), degrees(lat1)
                    , degrees(lon0), degrees(lon1)
                    , area))
            total_area += area
    print("total_area = {:10.4f} (difference of {:10.4f})"
            .format(total_area, abs(total_area) - sphere_area(radius)))

test_sphere()

@robert king It’s a really nice solution but has some sloppy bugs in it. I know it helped me a lot though, so never mind the sloppiness. :) Here is a cleaned up version….

from math import pi, asin, sin, degrees
halfpi, twopi = .5 * pi, 2 * pi
sphere_area = lambda R=1.0: 4 * pi * R ** 2

lat_dist = lambda lat, R=1.0: R*(1-sin(lat))

#A = 2*pi*R^2(1-sin(lat))
def sphere_latarea(lat, R=1.0):
    if -halfpi > lat or lat > halfpi:
        raise ValueError("lat must be between -halfpi and halfpi")
    return 2 * pi * R ** 2 * (1-sin(lat))

sphere_lonarea = lambda lon, R=1.0: \
        4 * pi * R ** 2 * lon / twopi

#A = 2*pi*R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
#    = (pi/180)R^2 |sin(lat1)-sin(lat2)| |lon1-lon2|
sphere_rectarea = lambda lat0, lat1, lon0, lon1, R=1.0: \
        (sphere_latarea(lat0, R)-sphere_latarea(lat1, R)) * (lon1-lon0) / twopi


def test_sphere(n_lats=10, n_lons=19, radius=540.0):
    total_area = 0.0
    for i_lons in range(n_lons):
        lon0 = twopi * float(i_lons) / n_lons
        lon1 = twopi * float(i_lons+1) / n_lons
        for i_lats in range(n_lats):
            lat0 = asin(2 * float(i_lats) / n_lats - 1)
            lat1 = asin(2 * float(i_lats+1)/n_lats - 1)
            area = sphere_rectarea(lat0, lat1, lon0, lon1, radius)
            print("{:} {:}: {:9.4f} to  {:9.4f}, {:9.4f} to  {:9.4f} => area {:10.4f}"
                    .format(i_lats, i_lons
                    , degrees(lat0), degrees(lat1)
                    , degrees(lon0), degrees(lon1)
                    , area))
            total_area += area
    print("total_area = {:10.4f} (difference of {:10.4f})"
            .format(total_area, abs(total_area) - sphere_area(radius)))

test_sphere()

回答 14

这行得通,而且非常简单。您想要的点数:

    private function moveTweets():void {


        var newScale:Number=Scale(meshes.length,50,500,6,2);
        trace("new scale:"+newScale);


        var l:Number=this.meshes.length;
        var tweetMeshInstance:TweetMesh;
        var destx:Number;
        var desty:Number;
        var destz:Number;
        for (var i:Number=0;i<this.meshes.length;i++){

            tweetMeshInstance=meshes[i];

            var phi:Number = Math.acos( -1 + ( 2 * i ) / l );
            var theta:Number = Math.sqrt( l * Math.PI ) * phi;

            tweetMeshInstance.origX = (sphereRadius+5) * Math.cos( theta ) * Math.sin( phi );
            tweetMeshInstance.origY= (sphereRadius+5) * Math.sin( theta ) * Math.sin( phi );
            tweetMeshInstance.origZ = (sphereRadius+5) * Math.cos( phi );

            destx=sphereRadius * Math.cos( theta ) * Math.sin( phi );
            desty=sphereRadius * Math.sin( theta ) * Math.sin( phi );
            destz=sphereRadius * Math.cos( phi );

            tweetMeshInstance.lookAt(new Vector3D());


            TweenMax.to(tweetMeshInstance, 1, {scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});

        }

    }
    private function onLookAtTween(theMesh:TweetMesh):void {
        theMesh.lookAt(new Vector3D());
    }

This works and it’s deadly simple. As many points as you want:

    private function moveTweets():void {


        var newScale:Number=Scale(meshes.length,50,500,6,2);
        trace("new scale:"+newScale);


        var l:Number=this.meshes.length;
        var tweetMeshInstance:TweetMesh;
        var destx:Number;
        var desty:Number;
        var destz:Number;
        for (var i:Number=0;i<this.meshes.length;i++){

            tweetMeshInstance=meshes[i];

            var phi:Number = Math.acos( -1 + ( 2 * i ) / l );
            var theta:Number = Math.sqrt( l * Math.PI ) * phi;

            tweetMeshInstance.origX = (sphereRadius+5) * Math.cos( theta ) * Math.sin( phi );
            tweetMeshInstance.origY= (sphereRadius+5) * Math.sin( theta ) * Math.sin( phi );
            tweetMeshInstance.origZ = (sphereRadius+5) * Math.cos( phi );

            destx=sphereRadius * Math.cos( theta ) * Math.sin( phi );
            desty=sphereRadius * Math.sin( theta ) * Math.sin( phi );
            destz=sphereRadius * Math.cos( phi );

            tweetMeshInstance.lookAt(new Vector3D());


            TweenMax.to(tweetMeshInstance, 1, {scaleX:newScale,scaleY:newScale,x:destx,y:desty,z:destz,onUpdate:onLookAtTween, onUpdateParams:[tweetMeshInstance]});

        }

    }
    private function onLookAtTween(theMesh:TweetMesh):void {
        theMesh.lookAt(new Vector3D());
    }

Python中的模块化乘法逆函数

问题:Python中的模块化乘法逆函数

一些标准的Python模块是否包含用于计算数字(即诸如)的模数乘法逆的函数?Google似乎对此没有任何好的暗示。y = invmod(x, p)x*y == 1 (mod p)

当然,可以提出扩展的欧几里得算法的自酿10线性算法,但是为什么要重新发明轮子呢?

例如,Java的BigIntegerhas modInverse方法。Python没有类似的东西吗?

Does some standard Python module contain a function to compute modular multiplicative inverse of a number, i.e. a number y = invmod(x, p) such that x*y == 1 (mod p)? Google doesn’t seem to give any good hints on this.

Of course, one can come up with home-brewed 10-liner of extended Euclidean algorithm, but why reinvent the wheel.

For example, Java’s BigInteger has modInverse method. Doesn’t Python have something similar?


回答 0

也许有人会觉得这很有用(来自Wikibooks):

def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m

Maybe someone will find this useful (from wikibooks):

def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m

回答 1

如果您的模数是素数(称为p),则可以简单地计算:

y = x**(p-2) mod p  # Pseudocode

或者在Python中:

y = pow(x, p-2, p)

这是已经在Python中实现了一些数论功能的人:http : //www.math.umbc.edu/~campbell/Computers/Python/numbthy.html

这是在提示符下完成的示例:

m = 1000000007
x = 1234567
y = pow(x,m-2,m)
y
989145189L
x*y
1221166008548163L
x*y % m
1L

If your modulus is prime (you call it p) then you may simply compute:

y = x**(p-2) mod p  # Pseudocode

Or in Python proper:

y = pow(x, p-2, p)

Here is someone who has implemented some number theory capabilities in Python: http://www.math.umbc.edu/~campbell/Computers/Python/numbthy.html

Here is an example done at the prompt:

m = 1000000007
x = 1234567
y = pow(x,m-2,m)
y
989145189L
x*y
1221166008548163L
x*y % m
1L

回答 2

您可能还需要查看gmpy模块。它是Python和GMP多重精度库之间的接口。gmpy提供了一个invert函数,可以完全满足您的需求:

>>> import gmpy
>>> gmpy.invert(1234567, 1000000007)
mpz(989145189)

更新的答案

如@hyh所示,gmpy.invert()如果不存在逆,则返回0。符合GMP mpz_invert()功能的行为。gmpy.divm(a, b, m)提供的一般解决方案a=bx (mod m)

>>> gmpy.divm(1, 1234567, 1000000007)
mpz(989145189)
>>> gmpy.divm(1, 0, 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 9)
mpz(7)

divm()gcd(b,m) == 1不存在乘法逆时,将返回一个解决方案,并引发异常。

免责声明:我是gmpy库的当前维护者。

更新的答案2

现在,当反函数不存在时,gmpy2会正确引发一个异常:

>>> import gmpy2

>>> gmpy2.invert(0,5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: invert() no inverse exists

You might also want to look at the gmpy module. It is an interface between Python and the GMP multiple-precision library. gmpy provides an invert function that does exactly what you need:

>>> import gmpy
>>> gmpy.invert(1234567, 1000000007)
mpz(989145189)

Updated answer

As noted by @hyh , the gmpy.invert() returns 0 if the inverse does not exist. That matches the behavior of GMP’s mpz_invert() function. gmpy.divm(a, b, m) provides a general solution to a=bx (mod m).

>>> gmpy.divm(1, 1234567, 1000000007)
mpz(989145189)
>>> gmpy.divm(1, 0, 5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 9)
mpz(7)

divm() will return a solution when gcd(b,m) == 1 and raises an exception when the multiplicative inverse does not exist.

Disclaimer: I’m the current maintainer of the gmpy library.

Updated answer 2

gmpy2 now properly raises an exception when the inverse does not exists:

>>> import gmpy2

>>> gmpy2.invert(0,5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: invert() no inverse exists

回答 3

从3.8开始,python的pow()函数可以采用一个模数和一个负整数。看这里。他们如何使用它的情况是

>>> pow(38, -1, 97)
23
>>> 23 * 38 % 97 == 1
True

As of 3.8 pythons pow() function can take a modulus and a negative integer. See here. Their case for how to use it is

>>> pow(38, -1, 97)
23
>>> 23 * 38 % 97 == 1
True

回答 4

这是CodeFights的一线工具;它是最短的解决方案之一:

MMI = lambda A, n,s=1,t=0,N=0: (n < 2 and t%N or MMI(n, A%n, t, s-A//n*t, N or n),-1)[n<1]

-1如果A在中没有乘法逆,它将返回n

用法:

MMI(23, 99) # returns 56
MMI(18, 24) # return -1

该解决方案使用扩展的欧几里得算法

Here is a one-liner for CodeFights; it is one of the shortest solutions:

MMI = lambda A, n,s=1,t=0,N=0: (n < 2 and t%N or MMI(n, A%n, t, s-A//n*t, N or n),-1)[n<1]

It will return -1 if A has no multiplicative inverse in n.

Usage:

MMI(23, 99) # returns 56
MMI(18, 24) # return -1

The solution uses the Extended Euclidean Algorithm.


回答 5

Sympy,为象征性的数学Python模块,具有内置式模块反函数,如果你不希望实现自己的(或者,如果您已经在使用Sympy):

from sympy import mod_inverse

mod_inverse(11, 35) # returns 16
mod_inverse(15, 35) # raises ValueError: 'inverse of 15 (mod 35) does not exist'

这似乎没有在Sympy网站上记录,但这里是文档字符串:Github上的Sympy mod_inverse文档字符串

Sympy, a python module for symbolic mathematics, has a built-in modular inverse function if you don’t want to implement your own (or if you’re using Sympy already):

from sympy import mod_inverse

mod_inverse(11, 35) # returns 16
mod_inverse(15, 35) # raises ValueError: 'inverse of 15 (mod 35) does not exist'

This doesn’t seem to be documented on the Sympy website, but here’s the docstring: Sympy mod_inverse docstring on Github


回答 6

这是我的代码,可能有点草率,但是无论如何它似乎对我有用。

# a is the number you want the inverse for
# b is the modulus

def mod_inverse(a, b):
    r = -1
    B = b
    A = a
    eq_set = []
    full_set = []
    mod_set = []

    #euclid's algorithm
    while r!=1 and r!=0:
        r = b%a
        q = b//a
        eq_set = [r, b, a, q*-1]
        b = a
        a = r
        full_set.append(eq_set)

    for i in range(0, 4):
        mod_set.append(full_set[-1][i])

    mod_set.insert(2, 1)
    counter = 0

    #extended euclid's algorithm
    for i in range(1, len(full_set)):
        if counter%2 == 0:
            mod_set[2] = full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
            mod_set[3] = full_set[-1*(i+1)][1]

        elif counter%2 != 0:
            mod_set[4] = full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
            mod_set[1] = full_set[-1*(i+1)][1]

        counter += 1

    if mod_set[3] == B:
        return mod_set[2]%B
    return mod_set[4]%B

Here is my code, it might be sloppy but it seems to work for me anyway.

# a is the number you want the inverse for
# b is the modulus

def mod_inverse(a, b):
    r = -1
    B = b
    A = a
    eq_set = []
    full_set = []
    mod_set = []

    #euclid's algorithm
    while r!=1 and r!=0:
        r = b%a
        q = b//a
        eq_set = [r, b, a, q*-1]
        b = a
        a = r
        full_set.append(eq_set)

    for i in range(0, 4):
        mod_set.append(full_set[-1][i])

    mod_set.insert(2, 1)
    counter = 0

    #extended euclid's algorithm
    for i in range(1, len(full_set)):
        if counter%2 == 0:
            mod_set[2] = full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
            mod_set[3] = full_set[-1*(i+1)][1]

        elif counter%2 != 0:
            mod_set[4] = full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
            mod_set[1] = full_set[-1*(i+1)][1]

        counter += 1

    if mod_set[3] == B:
        return mod_set[2]%B
    return mod_set[4]%B

回答 7

上面的代码无法在python3中运行,并且与GCD变体相比效率较低。但是,此代码非常透明。它触发了我创建一个更紧凑的版本:

def imod(a, n):
 c = 1
 while (c % a > 0):
     c += n
 return c // a

The code above will not run in python3 and is less efficient compared to the GCD variants. However, this code is very transparent. It triggered me to create a more compact version:

def imod(a, n):
 c = 1
 while (c % a > 0):
     c += n
 return c // a

回答 8

这是一个简洁的1-liner,无需使用任何外部库即可执行此操作。

# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).
# In particular, if a,b are relatively prime, returns the inverse of a modulo b.
def invmod(a,b): return 0 if a==0 else 1 if b%a==0 else b - invmod(b%a,a)*b//a

请注意,这实际上只是egcd,经过精简后仅返回单个感兴趣的系数。

Here is a concise 1-liner that does it, without using any external libraries.

# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).
# In particular, if a,b are relatively prime, returns the inverse of a modulo b.
def invmod(a,b): return 0 if a==0 else 1 if b%a==0 else b - invmod(b%a,a)*b//a

Note that this is really just egcd, streamlined to return only the single coefficient of interest.


回答 9

为了弄清楚模块化乘法逆,我建议使用扩展欧几里得算法,如下所示:

def multiplicative_inverse(a, b):
    origA = a
    X = 0
    prevX = 1
    Y = 1
    prevY = 0
    while b != 0:
        temp = b
        quotient = a/b
        b = a%b
        a = temp
        temp = X
        a = prevX - quotient * X
        prevX = temp
        temp = Y
        Y = prevY - quotient * Y
        prevY = temp

    return origA + prevY

To figure out the modular multiplicative inverse I recommend using the Extended Euclidean Algorithm like this:

def multiplicative_inverse(a, b):
    origA = a
    X = 0
    prevX = 1
    Y = 1
    prevY = 0
    while b != 0:
        temp = b
        quotient = a/b
        b = a%b
        a = temp
        temp = X
        a = prevX - quotient * X
        prevX = temp
        temp = Y
        Y = prevY - quotient * Y
        prevY = temp

    return origA + prevY

回答 10

我从该线程尝试了不同的解决方案,最后我使用了一个:

def egcd(a, b):
    lastremainder, remainder = abs(a), abs(b)
    x, lastx, y, lasty = 0, 1, 1, 0
    while remainder:
        lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
        x, lastx = lastx - quotient*x, x
        y, lasty = lasty - quotient*y, y
    return lastremainder, lastx * (-1 if a < 0 else 1), lasty * (-1 if b < 0 else 1)


def modinv(a, m):
    g, x, y = self.egcd(a, m)
    if g != 1:
        raise ValueError('modinv for {} does not exist'.format(a))
    return x % m

Python中的Modular_inverse

I try different solutions from this thread and in the end I use this one:

def egcd(a, b):
    lastremainder, remainder = abs(a), abs(b)
    x, lastx, y, lasty = 0, 1, 1, 0
    while remainder:
        lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
        x, lastx = lastx - quotient*x, x
        y, lasty = lasty - quotient*y, y
    return lastremainder, lastx * (-1 if a < 0 else 1), lasty * (-1 if b < 0 else 1)


def modinv(a, m):
    g, x, y = self.egcd(a, m)
    if g != 1:
        raise ValueError('modinv for {} does not exist'.format(a))
    return x % m

Modular_inverse in Python


回答 11

好吧,我在python中没有函数,但是我在C中具有可以轻松转换为python的函数,在下面的c函数中,扩展的欧几里得算法用于计算逆mod。

int imod(int a,int n){
int c,i=1;
while(1){
    c = n * i + 1;
    if(c%a==0){
        c = c/a;
        break;
    }
    i++;
}
return c;}

Python函数

def imod(a,n):
  i=1
  while True:
    c = n * i + 1;
    if(c%a==0):
      c = c/a
      break;
    i = i+1

  return c

从下面的链接C程序中引用了上面的C函数,以找到两个相对质数的模乘逆

Well, I don’t have a function in python but I have a function in C which you can easily convert to python, in the below c function extended euclidian algorithm is used to calculate inverse mod.

int imod(int a,int n){
int c,i=1;
while(1){
    c = n * i + 1;
    if(c%a==0){
        c = c/a;
        break;
    }
    i++;
}
return c;}

Python Function

def imod(a,n):
  i=1
  while True:
    c = n * i + 1;
    if(c%a==0):
      c = c/a
      break;
    i = i+1

  return c

Reference to the above C function is taken from the following link C program to find Modular Multiplicative Inverse of two Relatively Prime Numbers


回答 12

从cpython实现源代码

def invmod(a, n):
    b, c = 1, 0
    while n:
        q, r = divmod(a, n)
        a, b, c, n = n, c, b - q*c, r
    # at this point a is the gcd of the original inputs
    if a == 1:
        return b
    raise ValueError("Not invertible")

根据此代码上方的注释,它可以返回小的负值,因此您可以潜在地检查是否为负,并在返回b之前将n添加为负。

from the cpython implementation source code:

def invmod(a, n):
    b, c = 1, 0
    while n:
        q, r = divmod(a, n)
        a, b, c, n = n, c, b - q*c, r
    # at this point a is the gcd of the original inputs
    if a == 1:
        return b
    raise ValueError("Not invertible")

according to the comment above this code, it can return small negative values, so you could potentially check if negative and add n when negative before returning b.


回答 13

截至2017年1月23日,上面的许多链接已断开。我找到了这个实现:https : //courses.csail.mit.edu/6.857/2016/files/ffield.py

Many of the links above are broken as for 1/23/2017. I found this implementation: https://courses.csail.mit.edu/6.857/2016/files/ffield.py


如何将没有空格的文本分割成单词列表?

问题:如何将没有空格的文本分割成单词列表?

输入: "tableapplechairtablecupboard..."很多单词

将此类文本拆分为单词列表并获得以下内容的有效算法是什么?

输出: ["table", "apple", "chair", "table", ["cupboard", ["cup", "board"]], ...]

首先想到的是遍历所有可能的单词(从第一个字母开始)并找到可能的最长单词,然后从 position=word_position+len(word)

PS:
我们列出了所有可能的单词。
单词“ cupboard”可以是“ cup”和“ board”,选择时间最长。
语言:python,但主要是算法本身。

Input: "tableapplechairtablecupboard..." many words

What would be an efficient algorithm to split such text to the list of words and get:

Output: ["table", "apple", "chair", "table", ["cupboard", ["cup", "board"]], ...]

First thing that cames to mind is to go through all possible words (starting with first letter) and find the longest word possible, continue from position=word_position+len(word)

P.S.
We have a list of all possible words.
Word “cupboard” can be “cup” and “board”, select longest.
Language: python, but main thing is the algorithm itself.


回答 0

如果将朴素的算法应用于现实世界的数据,则效果不佳。这是一条20行的算法,该算法利用相对词频来为实词文本提供准确的结果。

(如果您想回答不使用词频的原始问题的答案,则需要完善“最长词”的确切含义:最好是使用20个字母的单词和10个3字母的单词,还是最好有五个10个字母的单词?确定精确的定义后,只需更改定义的行wordcost即可反映出预期的含义。)

这个主意

最好的进行方法是对输出的分布进行建模。一个很好的第一近似是假设所有单词都是独立分布的。然后,您只需要知道所有单词的相对频率即可。可以合理地假设它们遵循Zipf定律,也就是说,单词列表中排名为n的单词的概率约为1 /(n log N),其中N是词典中的单词数。

固定模型后,可以使用动态编程来推断空间的位置。最可能的句子是最大化每个单词的概率乘积的句子,并且可以通过动态编程轻松地计算出该句子。代替直接使用概率,我们使用定义为概率倒数的对数的代价来避免溢出。

代码

from math import log

# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)

def infer_spaces(s):
    """Uses dynamic programming to infer the location of spaces in a string
    without spaces."""

    # Find the best match for the i first characters, assuming cost has
    # been built for the i-1 first characters.
    # Returns a pair (match_cost, match_length).
    def best_match(i):
        candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
        return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)

    # Build the cost array.
    cost = [0]
    for i in range(1,len(s)+1):
        c,k = best_match(i)
        cost.append(c)

    # Backtrack to recover the minimal-cost string.
    out = []
    i = len(s)
    while i>0:
        c,k = best_match(i)
        assert c == cost[i]
        out.append(s[i-k:i])
        i -= k

    return " ".join(reversed(out))

您可以使用

s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))

结果

我使用的是从Wikipedia的一小部分中拼凑而成的这个快速且肮脏的125k字字典

之前: thumbgreenappleactiveassignmentweekly隐喻。
之后:拇指绿苹果积极地分配每周的隐喻。

之前:有一种从html解析出的软注释信息,但是却重新获得了有限的字符内阁,例如,thumbgreenappleactiveassignmentlymetapho rapparentlytherearthumbgreenappleetc 字符串中应该用词条查询该词是否合理,这是最快的提取方法。

之后:从html解析出大量的人民评论文本信息,但是其中没有定界字符,例如,拇指绿苹果活跃作业每周隐喻显然在字符串中有拇指绿苹果等,我也有一个大词典查询单词是否合理,那么提取最快的方法是什么。

之前:夜晚是阴暗的暴风雨,除了偶然的间隔外,当被狂风吹过一阵子的时候,暴风雨席卷了整个伦敦的大街小巷,这在我们的幕后掠过,并迅速燃烧了灯火,直到黑暗中挣扎了。

之后:那是一个黑暗而暴风雨的夜晚,雨水倾盆而下,除了偶尔被狂风吹拂而扫过街道的间隔,因为在伦敦,我们的场景在房顶上嘎嘎作响,剧烈地搅动着房屋。挣扎着黑暗的灯火。

如您所见,它本质上是完美的。最重要的部分是确保将单词列表训练成与您实际遇到的相似的语料库,否则结果将很糟糕。


优化

该实现消耗线性时间和内存,因此它是相当有效的。如果需要进一步的提速,则可以从单词列表中构建后缀树,以减少候选集的大小。

如果您需要处理非常大的连续字符串,则合理的做法是拆分字符串以避免过多的内存使用。例如,您可以按10000个字符的块处理文本,并在每侧加1000个字符的边距,以避免边界效应。这将使内存使用降至最低,并且几乎可以肯定不会对质量产生任何影响。

A naive algorithm won’t give good results when applied to real-world data. Here is a 20-line algorithm that exploits relative word frequency to give accurate results for real-word text.

(If you want an answer to your original question which does not use word frequency, you need to refine what exactly is meant by “longest word”: is it better to have a 20-letter word and ten 3-letter words, or is it better to have five 10-letter words? Once you settle on a precise definition, you just have to change the line defining wordcost to reflect the intended meaning.)

The idea

The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf’s law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.

Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it’s easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.

The code

from math import log

# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)

def infer_spaces(s):
    """Uses dynamic programming to infer the location of spaces in a string
    without spaces."""

    # Find the best match for the i first characters, assuming cost has
    # been built for the i-1 first characters.
    # Returns a pair (match_cost, match_length).
    def best_match(i):
        candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
        return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)

    # Build the cost array.
    cost = [0]
    for i in range(1,len(s)+1):
        c,k = best_match(i)
        cost.append(c)

    # Backtrack to recover the minimal-cost string.
    out = []
    i = len(s)
    while i>0:
        c,k = best_match(i)
        assert c == cost[i]
        out.append(s[i-k:i])
        i -= k

    return " ".join(reversed(out))

which you can use with

s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))

The results

I am using this quick-and-dirty 125k-word dictionary I put together from a small subset of Wikipedia.

Before: thumbgreenappleactiveassignmentweeklymetaphor.
After: thumb green apple active assignment weekly metaphor.

Before: thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearen odelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetapho rapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquery whetherthewordisreasonablesowhatsthefastestwayofextractionthxalot.

After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.

Before: itwasadarkandstormynighttherainfellintorrentsexceptatoccasionalintervalswhenitwascheckedbyaviolentgustofwindwhichsweptupthestreetsforitisinlondonthatoursceneliesrattlingalongthehousetopsandfiercelyagitatingthescantyflameofthelampsthatstruggledagainstthedarkness.

After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.

As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.


Optimization

The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.

If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.


回答 1

基于最佳答案中的出色工作,我创建了一个pip易于使用的软件包。

>>> import wordninja
>>> wordninja.split('derekanderson')
['derek', 'anderson']

要安装,请运行pip install wordninja

唯一的区别是微小的。这将返回a list而不是a str,它适用于python3,它包括单词列表并正确拆分,即使存在非字母字符(如下划线,破折号等)也是如此。

再次感谢通用人类!

https://github.com/keredson/wordninja

Based on the excellent work in the top answer, I’ve created a pip package for easy use.

>>> import wordninja
>>> wordninja.split('derekanderson')
['derek', 'anderson']

To install, run pip install wordninja.

The only differences are minor. This returns a list rather than a str, it works in python3, it includes the word list and properly splits even if there are non-alpha chars (like underscores, dashes, etc).

Thanks again to Generic Human!

https://github.com/keredson/wordninja


回答 2

这是使用递归搜索的解决方案:

def find_words(instring, prefix = '', words = None):
    if not instring:
        return []
    if words is None:
        words = set()
        with open('/usr/share/dict/words') as f:
            for line in f:
                words.add(line.strip())
    if (not prefix) and (instring in words):
        return [instring]
    prefix, suffix = prefix + instring[0], instring[1:]
    solutions = []
    # Case 1: prefix in solution
    if prefix in words:
        try:
            solutions.append([prefix] + find_words(suffix, '', words))
        except ValueError:
            pass
    # Case 2: prefix not in solution
    try:
        solutions.append(find_words(suffix, prefix, words))
    except ValueError:
        pass
    if solutions:
        return sorted(solutions,
                      key = lambda solution: [len(word) for word in solution],
                      reverse = True)[0]
    else:
        raise ValueError('no solution')

print(find_words('tableapplechairtablecupboard'))
print(find_words('tableprechaun', words = set(['tab', 'table', 'leprechaun'])))

Yield

['table', 'apple', 'chair', 'table', 'cupboard']
['tab', 'leprechaun']

Here is solution using recursive search:

def find_words(instring, prefix = '', words = None):
    if not instring:
        return []
    if words is None:
        words = set()
        with open('/usr/share/dict/words') as f:
            for line in f:
                words.add(line.strip())
    if (not prefix) and (instring in words):
        return [instring]
    prefix, suffix = prefix + instring[0], instring[1:]
    solutions = []
    # Case 1: prefix in solution
    if prefix in words:
        try:
            solutions.append([prefix] + find_words(suffix, '', words))
        except ValueError:
            pass
    # Case 2: prefix not in solution
    try:
        solutions.append(find_words(suffix, prefix, words))
    except ValueError:
        pass
    if solutions:
        return sorted(solutions,
                      key = lambda solution: [len(word) for word in solution],
                      reverse = True)[0]
    else:
        raise ValueError('no solution')

print(find_words('tableapplechairtablecupboard'))
print(find_words('tableprechaun', words = set(['tab', 'table', 'leprechaun'])))

yields

['table', 'apple', 'chair', 'table', 'cupboard']
['tab', 'leprechaun']

回答 3

使用一个trie 数据结构(该结构包含可能的单词的列表),执行以下操作不会太复杂:

  1. 前进指针(在串联字符串中)
  2. 查找并将相应的节点存储在trie中
  3. 如果trie节点有子节点(例如,单词较长),请转到1。
  4. 如果到达的节点没有孩子,则会出现最长的单词匹配;将单词(存储在节点中或在遍历遍历过程中只是串联在一起)添加到结果列表中,将指针重置为trie(或重置引用),然后重新开始

Using a trie data structure, which holds the list of possible words, it would not be too complicated to do the following:

  1. Advance pointer (in the concatenated string)
  2. Lookup and store the corresponding node in the trie
  3. If the trie node has children (e.g. there are longer words), go to 1.
  4. If the node reached has no children, a longest word match happened; add the word (stored in the node or just concatenated during trie traversal) to the result list, reset the pointer in the trie (or reset the reference), and start over

回答 4

Unutbu的解决方案非常接近,但是我发现代码难以阅读,并且没有产生预期的结果。通用人类解决方案的缺点是需要单词频率。不适用于所有用例。

这是一个使用分而治之算法的简单解决方案。

  1. 它试图使 Eg find_words('cupboard')返回['cupboard']而不是['cup', 'board'](假设cupboardcup并且board在字典中)的单词数量最少。
  2. 最佳解决方案不是唯一的,下面的实现返回一个解决方案。find_words('charactersin')可能会返回,['characters', 'in']或者可能会返回['character', 'sin'](如下所示)。您可以很容易地修改算法以返回所有最优解。
  3. 在此实现中,将记住解决方案,以使其在合理的时间内运行。

代码:

words = set()
with open('/usr/share/dict/words') as f:
    for line in f:
        words.add(line.strip())

solutions = {}
def find_words(instring):
    # First check if instring is in the dictionnary
    if instring in words:
        return [instring]
    # No... But maybe it's a result we already computed
    if instring in solutions:
        return solutions[instring]
    # Nope. Try to split the string at all position to recursively search for results
    best_solution = None
    for i in range(1, len(instring) - 1):
        part1 = find_words(instring[:i])
        part2 = find_words(instring[i:])
        # Both parts MUST have a solution
        if part1 is None or part2 is None:
            continue
        solution = part1 + part2
        # Is the solution found "better" than the previous one?
        if best_solution is None or len(solution) < len(best_solution):
            best_solution = solution
    # Remember (memoize) this solution to avoid having to recompute it
    solutions[instring] = best_solution
    return best_solution

在我的3GHz机器上,这大约需要5秒:

result = find_words("thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearenodelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetaphorapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquerywhetherthewordisreasonablesowhatsthefastestwayofextractionthxalot")
assert(result is not None)
print ' '.join(result)

从html解析的人民评论文本信息的全部内容,但没有定界字符,例如,拇指绿苹果活动任务每周隐喻显然在字符串中有拇指绿苹果等,我也有一个大型词典来查询是否这个词很合理,所以提取很多最快的方法是什么

Unutbu’s solution was quite close but I find the code difficult to read, and it didn’t yield the expected result. Generic Human’s solution has the drawback that it needs word frequencies. Not appropriate for all use case.

Here’s a simple solution using a Divide and Conquer algorithm.

  1. It tries to minimize the number of words E.g. find_words('cupboard') will return ['cupboard'] rather than ['cup', 'board'] (assuming that cupboard, cup and board are in the dictionnary)
  2. The optimal solution is not unique, the implementation below returns a solution. find_words('charactersin') could return ['characters', 'in'] or maybe it will return ['character', 'sin'] (as seen below). You could quite easily modify the algorithm to return all optimal solutions.
  3. In this implementation solutions are memoized so that it runs in a reasonable time.

The code:

words = set()
with open('/usr/share/dict/words') as f:
    for line in f:
        words.add(line.strip())

solutions = {}
def find_words(instring):
    # First check if instring is in the dictionnary
    if instring in words:
        return [instring]
    # No... But maybe it's a result we already computed
    if instring in solutions:
        return solutions[instring]
    # Nope. Try to split the string at all position to recursively search for results
    best_solution = None
    for i in range(1, len(instring) - 1):
        part1 = find_words(instring[:i])
        part2 = find_words(instring[i:])
        # Both parts MUST have a solution
        if part1 is None or part2 is None:
            continue
        solution = part1 + part2
        # Is the solution found "better" than the previous one?
        if best_solution is None or len(solution) < len(best_solution):
            best_solution = solution
    # Remember (memoize) this solution to avoid having to recompute it
    solutions[instring] = best_solution
    return best_solution

This will take about about 5sec on my 3GHz machine:

result = find_words("thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearenodelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetaphorapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquerywhetherthewordisreasonablesowhatsthefastestwayofextractionthxalot")
assert(result is not None)
print ' '.join(result)

the reis masses of text information of peoples comments which is parsed from h t m l but there are no delimited character sin them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple e t c in the string i also have a large dictionary to query whether the word is reasonable so whats the fastest way of extraction t h x a lot


回答 5

https://stackoverflow.com/users/1515832/generic-human的答案很棒。但是我见过的最好的实现是彼得·诺维格本人在他的《美丽的数据》一书中写的。

在粘贴他的代码之前,让我先解释一下为什么Norvig的方法更准确(尽管在代码方面会更慢一些,也更长一些)。

1)数据要好一些-就大小和精度而言(他使用字数而不是简单的排名)2)更重要的是,n-gram背后的逻辑确实使方法如此精确。

他在书中提供的示例是拆分字符串“ sitdown”的问题。现在,一个非二字组合的字符串拆分方法将考虑p(’sit’)* p(’down’),如果小于p(’sitdown’)-这种情况经常发生-它将不会拆分它,但我们希望(大多数时候)。

但是,当您具有bigram模型时,可以将p(’sit down’)视为bigram vs p(’sitdown’),并且前者获胜。基本上,如果您不使用双字母组,它将把您拆分的单词的概率视为独立,而不是这种情况,某些单词更有可能一个接一个地出现。不幸的是,这些词在很多情况下经常被粘在一起,使拆分器感到困惑。

这是数据的链接(它是3个独立问题的数据,而细分仅仅是一个。请阅读本章以获取详细信息):http : //norvig.com/ngrams/

这是代码的链接:http : //norvig.com/ngrams/ngrams.py

这些链接已经建立了一段时间,但是无论如何我都会在此处复制粘贴代码的分段部分

import re, string, random, glob, operator, heapq
from collections import defaultdict
from math import log10

def memo(f):
    "Memoize function f."
    table = {}
    def fmemo(*args):
        if args not in table:
            table[args] = f(*args)
        return table[args]
    fmemo.memo = table
    return fmemo

def test(verbose=None):
    """Run some tests, taken from the chapter.
    Since the hillclimbing algorithm is randomized, some tests may fail."""
    import doctest
    print 'Running tests...'
    doctest.testfile('ngrams-test.txt', verbose=verbose)

################ Word Segmentation (p. 223)

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

def splits(text, L=20):
    "Return a list of all possible (first, rem) pairs, len(first)<=L."
    return [(text[:i+1], text[i+1:]) 
            for i in range(min(len(text), L))]

def Pwords(words): 
    "The Naive Bayes probability of a sequence of words."
    return product(Pw(w) for w in words)

#### Support functions (p. 224)

def product(nums):
    "Return the product of a sequence of numbers."
    return reduce(operator.mul, nums, 1)

class Pdist(dict):
    "A probability distribution estimated from counts in datafile."
    def __init__(self, data=[], N=None, missingfn=None):
        for key,count in data:
            self[key] = self.get(key, 0) + int(count)
        self.N = float(N or sum(self.itervalues()))
        self.missingfn = missingfn or (lambda k, N: 1./N)
    def __call__(self, key): 
        if key in self: return self[key]/self.N  
        else: return self.missingfn(key, self.N)

def datafile(name, sep='\t'):
    "Read key,value pairs from file."
    for line in file(name):
        yield line.split(sep)

def avoid_long_words(key, N):
    "Estimate the probability of an unknown word."
    return 10./(N * 10**len(key))

N = 1024908267229 ## Number of tokens

Pw  = Pdist(datafile('count_1w.txt'), N, avoid_long_words)

#### segment2: second version, with bigram counts, (p. 226-227)

def cPw(word, prev):
    "Conditional probability of word, given previous word."
    try:
        return P2w[prev + ' ' + word]/float(Pw[prev])
    except KeyError:
        return Pw(word)

P2w = Pdist(datafile('count_2w.txt'), N)

@memo 
def segment2(text, prev='<S>'): 
    "Return (log P(words), words), where words is the best segmentation." 
    if not text: return 0.0, [] 
    candidates = [combine(log10(cPw(first, prev)), first, segment2(rem, first)) 
                  for first,rem in splits(text)] 
    return max(candidates) 

def combine(Pfirst, first, (Prem, rem)): 
    "Combine first and rem results into one (probability, words) pair." 
    return Pfirst+Prem, [first]+rem 

The answer by https://stackoverflow.com/users/1515832/generic-human is great. But the best implementation of this I’ve ever seen was written Peter Norvig himself in his book ‘Beautiful Data’.

Before I paste his code, let me expand on why Norvig’s method is more accurate (although a little slower and longer in terms of code).

1) The data is a bit better – both in terms of size and in terms of precision (he uses a word count rather than a simple ranking) 2) More importantly, it’s the logic behind n-grams that really makes the approach so accurate.

The example he provides in his book is the problem of splitting a string ‘sitdown’. Now a non-bigram method of string split would consider p(‘sit’) * p (‘down’), and if this less than the p(‘sitdown’) – which will be the case quite often – it will NOT split it, but we’d want it to (most of the time).

However when you have the bigram model you could value p(‘sit down’) as a bigram vs p(‘sitdown’) and the former wins. Basically, if you don’t use bigrams, it treats the probability of the words you’re splitting as independent, which is not the case, some words are more likely to appear one after the other. Unfortunately those are also the words that are often stuck together in a lot of instances and confuses the splitter.

Here’s the link to the data (it’s data for 3 separate problems and segmentation is only one. Please read the chapter for details): http://norvig.com/ngrams/

and here’s the link to the code: http://norvig.com/ngrams/ngrams.py

These links have been up a while, but I’ll copy paste the segmentation part of the code here anyway

import re, string, random, glob, operator, heapq
from collections import defaultdict
from math import log10

def memo(f):
    "Memoize function f."
    table = {}
    def fmemo(*args):
        if args not in table:
            table[args] = f(*args)
        return table[args]
    fmemo.memo = table
    return fmemo

def test(verbose=None):
    """Run some tests, taken from the chapter.
    Since the hillclimbing algorithm is randomized, some tests may fail."""
    import doctest
    print 'Running tests...'
    doctest.testfile('ngrams-test.txt', verbose=verbose)

################ Word Segmentation (p. 223)

@memo
def segment(text):
    "Return a list of words that is the best segmentation of text."
    if not text: return []
    candidates = ([first]+segment(rem) for first,rem in splits(text))
    return max(candidates, key=Pwords)

def splits(text, L=20):
    "Return a list of all possible (first, rem) pairs, len(first)<=L."
    return [(text[:i+1], text[i+1:]) 
            for i in range(min(len(text), L))]

def Pwords(words): 
    "The Naive Bayes probability of a sequence of words."
    return product(Pw(w) for w in words)

#### Support functions (p. 224)

def product(nums):
    "Return the product of a sequence of numbers."
    return reduce(operator.mul, nums, 1)

class Pdist(dict):
    "A probability distribution estimated from counts in datafile."
    def __init__(self, data=[], N=None, missingfn=None):
        for key,count in data:
            self[key] = self.get(key, 0) + int(count)
        self.N = float(N or sum(self.itervalues()))
        self.missingfn = missingfn or (lambda k, N: 1./N)
    def __call__(self, key): 
        if key in self: return self[key]/self.N  
        else: return self.missingfn(key, self.N)

def datafile(name, sep='\t'):
    "Read key,value pairs from file."
    for line in file(name):
        yield line.split(sep)

def avoid_long_words(key, N):
    "Estimate the probability of an unknown word."
    return 10./(N * 10**len(key))

N = 1024908267229 ## Number of tokens

Pw  = Pdist(datafile('count_1w.txt'), N, avoid_long_words)

#### segment2: second version, with bigram counts, (p. 226-227)

def cPw(word, prev):
    "Conditional probability of word, given previous word."
    try:
        return P2w[prev + ' ' + word]/float(Pw[prev])
    except KeyError:
        return Pw(word)

P2w = Pdist(datafile('count_2w.txt'), N)

@memo 
def segment2(text, prev='<S>'): 
    "Return (log P(words), words), where words is the best segmentation." 
    if not text: return 0.0, [] 
    candidates = [combine(log10(cPw(first, prev)), first, segment2(rem, first)) 
                  for first,rem in splits(text)] 
    return max(candidates) 

def combine(Pfirst, first, (Prem, rem)): 
    "Combine first and rem results into one (probability, words) pair." 
    return Pfirst+Prem, [first]+rem 

回答 6

这是转换为JavaScript的可接受答案(需要node.js,以及来自https://github.com/keredson/wordninja的文件“ wordninja_words.txt” ):

var fs = require("fs");

var splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
var maxWordLen = 0;
var wordCost = {};

fs.readFile("./wordninja_words.txt", 'utf8', function(err, data) {
    if (err) {
        throw err;
    }
    var words = data.split('\n');
    words.forEach(function(word, index) {
        wordCost[word] = Math.log((index + 1) * Math.log(words.length));
    })
    words.forEach(function(word) {
        if (word.length > maxWordLen)
            maxWordLen = word.length;
    });
    console.log(maxWordLen)
    splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
    console.log(split(process.argv[2]));
});


function split(s) {
    var list = [];
    s.split(splitRegex).forEach(function(sub) {
        _split(sub).forEach(function(word) {
            list.push(word);
        })
    })
    return list;
}
module.exports = split;


function _split(s) {
    var cost = [0];

    function best_match(i) {
        var candidates = cost.slice(Math.max(0, i - maxWordLen), i).reverse();
        var minPair = [Number.MAX_SAFE_INTEGER, 0];
        candidates.forEach(function(c, k) {
            if (wordCost[s.substring(i - k - 1, i).toLowerCase()]) {
                var ccost = c + wordCost[s.substring(i - k - 1, i).toLowerCase()];
            } else {
                var ccost = Number.MAX_SAFE_INTEGER;
            }
            if (ccost < minPair[0]) {
                minPair = [ccost, k + 1];
            }
        })
        return minPair;
    }

    for (var i = 1; i < s.length + 1; i++) {
        cost.push(best_match(i)[0]);
    }

    var out = [];
    i = s.length;
    while (i > 0) {
        var c = best_match(i)[0];
        var k = best_match(i)[1];
        if (c == cost[i])
            console.log("Alert: " + c);

        var newToken = true;
        if (s.slice(i - k, i) != "'") {
            if (out.length > 0) {
                if (out[-1] == "'s" || (Number.isInteger(s[i - 1]) && Number.isInteger(out[-1][0]))) {
                    out[-1] = s.slice(i - k, i) + out[-1];
                    newToken = false;
                }
            }
        }

        if (newToken) {
            out.push(s.slice(i - k, i))
        }

        i -= k

    }
    return out.reverse();
}

Here is the accepted answer translated to JavaScript (requires node.js, and the file “wordninja_words.txt” from https://github.com/keredson/wordninja):

var fs = require("fs");

var splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
var maxWordLen = 0;
var wordCost = {};

fs.readFile("./wordninja_words.txt", 'utf8', function(err, data) {
    if (err) {
        throw err;
    }
    var words = data.split('\n');
    words.forEach(function(word, index) {
        wordCost[word] = Math.log((index + 1) * Math.log(words.length));
    })
    words.forEach(function(word) {
        if (word.length > maxWordLen)
            maxWordLen = word.length;
    });
    console.log(maxWordLen)
    splitRegex = new RegExp("[^a-zA-Z0-9']+", "g");
    console.log(split(process.argv[2]));
});


function split(s) {
    var list = [];
    s.split(splitRegex).forEach(function(sub) {
        _split(sub).forEach(function(word) {
            list.push(word);
        })
    })
    return list;
}
module.exports = split;


function _split(s) {
    var cost = [0];

    function best_match(i) {
        var candidates = cost.slice(Math.max(0, i - maxWordLen), i).reverse();
        var minPair = [Number.MAX_SAFE_INTEGER, 0];
        candidates.forEach(function(c, k) {
            if (wordCost[s.substring(i - k - 1, i).toLowerCase()]) {
                var ccost = c + wordCost[s.substring(i - k - 1, i).toLowerCase()];
            } else {
                var ccost = Number.MAX_SAFE_INTEGER;
            }
            if (ccost < minPair[0]) {
                minPair = [ccost, k + 1];
            }
        })
        return minPair;
    }

    for (var i = 1; i < s.length + 1; i++) {
        cost.push(best_match(i)[0]);
    }

    var out = [];
    i = s.length;
    while (i > 0) {
        var c = best_match(i)[0];
        var k = best_match(i)[1];
        if (c == cost[i])
            console.log("Alert: " + c);

        var newToken = true;
        if (s.slice(i - k, i) != "'") {
            if (out.length > 0) {
                if (out[-1] == "'s" || (Number.isInteger(s[i - 1]) && Number.isInteger(out[-1][0]))) {
                    out[-1] = s.slice(i - k, i) + out[-1];
                    newToken = false;
                }
            }
        }

        if (newToken) {
            out.push(s.slice(i - k, i))
        }

        i -= k

    }
    return out.reverse();
}

回答 7

如果将单词表预编译到DFA中(这将非常慢),则匹配输入所花费的时间将与字符串的长度成比例(实际上,这仅比遍历字符串慢一点)。

实际上,这是前面提到的trie算法的更通用的版本。我只是为了完全完成而提到它-到目前为止,您还不能使用DFA实现。RE2可以使用,但是我不知道Python绑定是否可以让您调整DFA的大小,然后再丢弃编译的DFA数据并进行NFA搜索。

If you precompile the wordlist into a DFA (which will be very slow), then the time it takes to match an input will be proportional to the length of the string (in fact, only a little slower than just iterating over the string).

This is effectively a more general version of the trie algorithm which was mentioned earlier. I only mention it for completeless — as of yet, there’s no DFA implementation you can just use. RE2 would work, but I don’t know if the Python bindings let you tune how large you allow a DFA to be before it just throws away the compiled DFA data and does NFA search.


回答 8

看起来相当平凡的回溯可以做到。从字符串的开头开始。向右扫描,直到您有一个字。然后,在字符串的其余部分调用该函数。如果函数一直扫描到右边而没有识别出一个单词,则该函数返回“ false”。否则,返回找到的单词以及递归调用返回的单词列表。

示例:“ tableapple”。查找“ tab”,然后查找“ leap”,但“ ple”中没有单词。“ leapple”中没有其他词。查找“表”,然后查找“ app”。“ le”不是一个词,所以尝试苹果,认识并返回。

为了尽可能长的时间,请继续前进,只发出(而不是返回)正确的解决方案;然后,根据您选择的任何标准(最大值,最小值,平均值等)选择最佳的一种

It seems like fairly mundane backtracking will do. Start at the beggining of the string. Scan right until you have a word. Then, call the function on the rest of the string. Function returns “false” if it scans all the way to the right without recognizing a word. Otherwise, returns the word it found and the list of words returned by the recursive call.

Example: “tableapple”. Finds “tab”, then “leap”, but no word in “ple”. No other word in “leapple”. Finds “table”, then “app”. “le” not a word, so tries apple, recognizes, returns.

To get longest possible, keep going, only emitting (rather than returning) correct solutions; then, choose the optimal one by any criterion you choose (maxmax, minmax, average, etc.)


回答 9

基于unutbu的解决方案,我实现了Java版本:

private static List<String> splitWordWithoutSpaces(String instring, String suffix) {
    if(isAWord(instring)) {
        if(suffix.length() > 0) {
            List<String> rest = splitWordWithoutSpaces(suffix, "");
            if(rest.size() > 0) {
                List<String> solutions = new LinkedList<>();
                solutions.add(instring);
                solutions.addAll(rest);
                return solutions;
            }
        } else {
            List<String> solutions = new LinkedList<>();
            solutions.add(instring);
            return solutions;
        }

    }
    if(instring.length() > 1) {
        String newString = instring.substring(0, instring.length()-1);
        suffix = instring.charAt(instring.length()-1) + suffix;
        List<String> rest = splitWordWithoutSpaces(newString, suffix);
        return rest;
    }
    return Collections.EMPTY_LIST;
}

输入: "tableapplechairtablecupboard"

输出: [table, apple, chair, table, cupboard]

输入: "tableprechaun"

输出: [tab, leprechaun]

Based on unutbu’s solution I’ve implemented a Java version:

private static List<String> splitWordWithoutSpaces(String instring, String suffix) {
    if(isAWord(instring)) {
        if(suffix.length() > 0) {
            List<String> rest = splitWordWithoutSpaces(suffix, "");
            if(rest.size() > 0) {
                List<String> solutions = new LinkedList<>();
                solutions.add(instring);
                solutions.addAll(rest);
                return solutions;
            }
        } else {
            List<String> solutions = new LinkedList<>();
            solutions.add(instring);
            return solutions;
        }

    }
    if(instring.length() > 1) {
        String newString = instring.substring(0, instring.length()-1);
        suffix = instring.charAt(instring.length()-1) + suffix;
        List<String> rest = splitWordWithoutSpaces(newString, suffix);
        return rest;
    }
    return Collections.EMPTY_LIST;
}

Input: "tableapplechairtablecupboard"

Output: [table, apple, chair, table, cupboard]

Input: "tableprechaun"

Output: [tab, leprechaun]


回答 10

对于德语,有CharSplit,它使用机器学习,并且对几个单词的字符串非常有效。

https://github.com/dtuggener/CharSplit

For German language there is CharSplit which uses machine learning and works pretty good for strings of a few words.

https://github.com/dtuggener/CharSplit


回答 11

扩展@miku的建议使用a Trie,仅追加Trie是相对简单的实现python

class Node:
    def __init__(self, is_word=False):
        self.children = {}
        self.is_word = is_word

class TrieDictionary:
    def __init__(self, words=tuple()):
        self.root = Node()
        for word in words:
            self.add(word)

    def add(self, word):
        node = self.root
        for c in word:
            node = node.children.setdefault(c, Node())
        node.is_word = True

    def lookup(self, word, from_node=None):
        node = self.root if from_node is None else from_node
        for c in word:
            try:
                node = node.children[c]
            except KeyError:
                return None

        return node

然后,我们可以Trie根据一组单词构建一个基于字典的字典:

dictionary = {"a", "pea", "nut", "peanut", "but", "butt", "butte", "butter"}
trie_dictionary = TrieDictionary(words=dictionary)

它将产生如下所示的树(*指示单词的开头或结尾):

* -> a*
 \\\ 
  \\\-> p -> e -> a*
   \\              \-> n -> u -> t*
    \\
     \\-> b -> u -> t*
      \\             \-> t*
       \\                 \-> e*
        \\                     \-> r*
         \
          \-> n -> u -> t*

通过将其与关于如何选择单词的启发式方法相结合,我们可以将其合并到解决方案中。例如,与较短的单词相比,我们更喜欢较长的单词:

def using_trie_longest_word_heuristic(s):
    node = None
    possible_indexes = []

    # O(1) short-circuit if whole string is a word, doesn't go against longest-word wins
    if s in dictionary:
        return [ s ]

    for i in range(len(s)):
        # traverse the trie, char-wise to determine intermediate words
        node = trie_dictionary.lookup(s[i], from_node=node)

        # no more words start this way
        if node is None:
            # iterate words we have encountered from biggest to smallest
            for possible in possible_indexes[::-1]:
                # recurse to attempt to solve the remaining sub-string
                end_of_phrase = using_trie_longest_word_heuristic(s[possible+1:])

                # if we have a solution, return this word + our solution
                if end_of_phrase:
                    return [ s[:possible+1] ] + end_of_phrase

            # unsolvable
            break

        # if this is a leaf, append the index to the possible words list
        elif node.is_word:
            possible_indexes.append(i)

    # empty string OR unsolvable case 
    return []

我们可以这样使用此功能:

>>> using_trie_longest_word_heuristic("peanutbutter")
[ "peanut", "butter" ]

由于我们在Trie搜索越来越长的单词时一直保持在的位置,因此trie每个可能的解决方案最多遍历一次(而不是:,的2次数)。最终的短路使我们免于在最坏的情况下通过字符串随意移动。peanutpeapeanut

最终结果仅需进行少量检查:

'peanutbutter' - not a word, go charwise
'p' - in trie, use this node
'e' - in trie, use this node
'a' - in trie and edge, store potential word and use this node
'n' - in trie, use this node
'u' - in trie, use this node
't' - in trie and edge, store potential word and use this node
'b' - not in trie from `peanut` vector
'butter' - remainder of longest is a word

该解决方案的一个好处是,您可以很快地知道是否存在带有给定前缀的较长单词,从而无需针对字典全面测试序列组合。unsolvable与其他实现方式相比,这样做还使答案相对便宜。

该解决方案的缺点是要占用大量内存,trie以及trie前期构建成本。

Expanding on @miku’s suggestion to use a Trie, an append-only Trie is relatively straight-forward to implement in python:

class Node:
    def __init__(self, is_word=False):
        self.children = {}
        self.is_word = is_word

class TrieDictionary:
    def __init__(self, words=tuple()):
        self.root = Node()
        for word in words:
            self.add(word)

    def add(self, word):
        node = self.root
        for c in word:
            node = node.children.setdefault(c, Node())
        node.is_word = True

    def lookup(self, word, from_node=None):
        node = self.root if from_node is None else from_node
        for c in word:
            try:
                node = node.children[c]
            except KeyError:
                return None

        return node

We can then build a Trie-based dictionary from a set of words:

dictionary = {"a", "pea", "nut", "peanut", "but", "butt", "butte", "butter"}
trie_dictionary = TrieDictionary(words=dictionary)

Which will produce a tree that looks like this (* indicates beginning or end of a word):

* -> a*
 \\\ 
  \\\-> p -> e -> a*
   \\              \-> n -> u -> t*
    \\
     \\-> b -> u -> t*
      \\             \-> t*
       \\                 \-> e*
        \\                     \-> r*
         \
          \-> n -> u -> t*

We can incorporate this into a solution by combining it with a heuristic about how to choose words. For example we can prefer longer words over shorter words:

def using_trie_longest_word_heuristic(s):
    node = None
    possible_indexes = []

    # O(1) short-circuit if whole string is a word, doesn't go against longest-word wins
    if s in dictionary:
        return [ s ]

    for i in range(len(s)):
        # traverse the trie, char-wise to determine intermediate words
        node = trie_dictionary.lookup(s[i], from_node=node)

        # no more words start this way
        if node is None:
            # iterate words we have encountered from biggest to smallest
            for possible in possible_indexes[::-1]:
                # recurse to attempt to solve the remaining sub-string
                end_of_phrase = using_trie_longest_word_heuristic(s[possible+1:])

                # if we have a solution, return this word + our solution
                if end_of_phrase:
                    return [ s[:possible+1] ] + end_of_phrase

            # unsolvable
            break

        # if this is a leaf, append the index to the possible words list
        elif node.is_word:
            possible_indexes.append(i)

    # empty string OR unsolvable case 
    return []

We can use this function like this:

>>> using_trie_longest_word_heuristic("peanutbutter")
[ "peanut", "butter" ]

Because we maintain our position in the Trie as we search for longer and longer words, we traverse the trie at most once per possible solution (rather than 2 times for peanut: pea, peanut). The final short-circuit saves us from walking char-wise through the string in the worst-case.

The final result is only a handful of inspections:

'peanutbutter' - not a word, go charwise
'p' - in trie, use this node
'e' - in trie, use this node
'a' - in trie and edge, store potential word and use this node
'n' - in trie, use this node
'u' - in trie, use this node
't' - in trie and edge, store potential word and use this node
'b' - not in trie from `peanut` vector
'butter' - remainder of longest is a word

A benefit of this solution is in the fact that you know very quickly if longer words exist with a given prefix, which spares the need to exhaustively test sequence combinations against a dictionary. It also makes getting to an unsolvable answer comparatively cheap to other implementations.

The downsides of this solution are a large memory footprint for the trie and the cost of building the trie up-front.


回答 12

如果您包含字符串中包含的单词的详尽列表:

word_list = ["table", "apple", "chair", "cupboard"]

使用列表理解来遍历列表以找到单词以及单词出现的次数。

string = "tableapplechairtablecupboard"

def split_string(string, word_list):

    return ("".join([(item + " ")*string.count(item.lower()) for item in word_list if item.lower() in string])).strip()

该函数string按列表顺序返回单词的输出table table apple chair cupboard

If you have an exhaustive list of the words contained within the string:

word_list = ["table", "apple", "chair", "cupboard"]

Using list comprehension to iterate over the list to locate the word and how many times it appears.

string = "tableapplechairtablecupboard"

def split_string(string, word_list):

    return ("".join([(item + " ")*string.count(item.lower()) for item in word_list if item.lower() in string])).strip()


The function returns a string output of words in order of the list table table apple chair cupboard


回答 13

非常感谢https://github.com/keredson/wordninja/中的帮助

在我看来,Java的贡献很小。

可以将public方法splitContiguousWords与其他2种方法一起嵌入在同一目录中具有ninja_words.txt的类中(或根据编码器的选择进行修改)。并且该方法splitContiguousWords可以用于该目的。

public List<String> splitContiguousWords(String sentence) {

    String splitRegex = "[^a-zA-Z0-9']+";
    Map<String, Number> wordCost = new HashMap<>();
    List<String> dictionaryWords = IOUtils.linesFromFile("ninja_words.txt", StandardCharsets.UTF_8.name());
    double naturalLogDictionaryWordsCount = Math.log(dictionaryWords.size());
    long wordIdx = 0;
    for (String word : dictionaryWords) {
        wordCost.put(word, Math.log(++wordIdx * naturalLogDictionaryWordsCount));
    }
    int maxWordLength = Collections.max(dictionaryWords, Comparator.comparing(String::length)).length();
    List<String> splitWords = new ArrayList<>();
    for (String partSentence : sentence.split(splitRegex)) {
        splitWords.add(split(partSentence, wordCost, maxWordLength));
    }
    log.info("Split word for the sentence: {}", splitWords);
    return splitWords;
}

private String split(String partSentence, Map<String, Number> wordCost, int maxWordLength) {
    List<Pair<Number, Number>> cost = new ArrayList<>();
    cost.add(new Pair<>(Integer.valueOf(0), Integer.valueOf(0)));
    for (int index = 1; index < partSentence.length() + 1; index++) {
        cost.add(bestMatch(partSentence, cost, index, wordCost, maxWordLength));
    }
    int idx = partSentence.length();
    List<String> output = new ArrayList<>();
    while (idx > 0) {
        Pair<Number, Number> candidate = bestMatch(partSentence, cost, idx, wordCost, maxWordLength);
        Number candidateCost = candidate.getKey();
        Number candidateIndexValue = candidate.getValue();
        if (candidateCost.doubleValue() != cost.get(idx).getKey().doubleValue()) {
            throw new RuntimeException("Candidate cost unmatched; This should not be the case!");
        }
        boolean newToken = true;
        String token = partSentence.substring(idx - candidateIndexValue.intValue(), idx);
        if (token != "\'" && output.size() > 0) {
            String lastWord = output.get(output.size() - 1);
            if (lastWord.equalsIgnoreCase("\'s") ||
                    (Character.isDigit(partSentence.charAt(idx - 1)) && Character.isDigit(lastWord.charAt(0)))) {
                output.set(output.size() - 1, token + lastWord);
                newToken = false;
            }
        }
        if (newToken) {
            output.add(token);
        }
        idx -= candidateIndexValue.intValue();
    }
    return String.join(" ", Lists.reverse(output));
}


private Pair<Number, Number> bestMatch(String partSentence, List<Pair<Number, Number>> cost, int index,
                      Map<String, Number> wordCost, int maxWordLength) {
    List<Pair<Number, Number>> candidates = Lists.reverse(cost.subList(Math.max(0, index - maxWordLength), index));
    int enumerateIdx = 0;
    Pair<Number, Number> minPair = new Pair<>(Integer.MAX_VALUE, Integer.valueOf(enumerateIdx));
    for (Pair<Number, Number> pair : candidates) {
        ++enumerateIdx;
        String subsequence = partSentence.substring(index - enumerateIdx, index).toLowerCase();
        Number minCost = Integer.MAX_VALUE;
        if (wordCost.containsKey(subsequence)) {
            minCost = pair.getKey().doubleValue() + wordCost.get(subsequence).doubleValue();
        }
        if (minCost.doubleValue() < minPair.getKey().doubleValue()) {
            minPair = new Pair<>(minCost.doubleValue(), enumerateIdx);
        }
    }
    return minPair;
}

Many Thanks for the help in https://github.com/keredson/wordninja/

A small contribution of the same in Java from my side.

The public method splitContiguousWords could be embedded with the other 2 methods in the class having ninja_words.txt in same directory(or modified as per the choice of coder). And the method splitContiguousWords could be used for the purpose.

public List<String> splitContiguousWords(String sentence) {

    String splitRegex = "[^a-zA-Z0-9']+";
    Map<String, Number> wordCost = new HashMap<>();
    List<String> dictionaryWords = IOUtils.linesFromFile("ninja_words.txt", StandardCharsets.UTF_8.name());
    double naturalLogDictionaryWordsCount = Math.log(dictionaryWords.size());
    long wordIdx = 0;
    for (String word : dictionaryWords) {
        wordCost.put(word, Math.log(++wordIdx * naturalLogDictionaryWordsCount));
    }
    int maxWordLength = Collections.max(dictionaryWords, Comparator.comparing(String::length)).length();
    List<String> splitWords = new ArrayList<>();
    for (String partSentence : sentence.split(splitRegex)) {
        splitWords.add(split(partSentence, wordCost, maxWordLength));
    }
    log.info("Split word for the sentence: {}", splitWords);
    return splitWords;
}

private String split(String partSentence, Map<String, Number> wordCost, int maxWordLength) {
    List<Pair<Number, Number>> cost = new ArrayList<>();
    cost.add(new Pair<>(Integer.valueOf(0), Integer.valueOf(0)));
    for (int index = 1; index < partSentence.length() + 1; index++) {
        cost.add(bestMatch(partSentence, cost, index, wordCost, maxWordLength));
    }
    int idx = partSentence.length();
    List<String> output = new ArrayList<>();
    while (idx > 0) {
        Pair<Number, Number> candidate = bestMatch(partSentence, cost, idx, wordCost, maxWordLength);
        Number candidateCost = candidate.getKey();
        Number candidateIndexValue = candidate.getValue();
        if (candidateCost.doubleValue() != cost.get(idx).getKey().doubleValue()) {
            throw new RuntimeException("Candidate cost unmatched; This should not be the case!");
        }
        boolean newToken = true;
        String token = partSentence.substring(idx - candidateIndexValue.intValue(), idx);
        if (token != "\'" && output.size() > 0) {
            String lastWord = output.get(output.size() - 1);
            if (lastWord.equalsIgnoreCase("\'s") ||
                    (Character.isDigit(partSentence.charAt(idx - 1)) && Character.isDigit(lastWord.charAt(0)))) {
                output.set(output.size() - 1, token + lastWord);
                newToken = false;
            }
        }
        if (newToken) {
            output.add(token);
        }
        idx -= candidateIndexValue.intValue();
    }
    return String.join(" ", Lists.reverse(output));
}


private Pair<Number, Number> bestMatch(String partSentence, List<Pair<Number, Number>> cost, int index,
                      Map<String, Number> wordCost, int maxWordLength) {
    List<Pair<Number, Number>> candidates = Lists.reverse(cost.subList(Math.max(0, index - maxWordLength), index));
    int enumerateIdx = 0;
    Pair<Number, Number> minPair = new Pair<>(Integer.MAX_VALUE, Integer.valueOf(enumerateIdx));
    for (Pair<Number, Number> pair : candidates) {
        ++enumerateIdx;
        String subsequence = partSentence.substring(index - enumerateIdx, index).toLowerCase();
        Number minCost = Integer.MAX_VALUE;
        if (wordCost.containsKey(subsequence)) {
            minCost = pair.getKey().doubleValue() + wordCost.get(subsequence).doubleValue();
        }
        if (minCost.doubleValue() < minPair.getKey().doubleValue()) {
            minPair = new Pair<>(minCost.doubleValue(), enumerateIdx);
        }
    }
    return minPair;
}

回答 14

这会有所帮助

from wordsegment import load, segment
load()
segment('providesfortheresponsibilitiesofperson')

This will help

from wordsegment import load, segment
load()
segment('providesfortheresponsibilitiesofperson')

回答 15

您需要确定您的词汇表-也许任何免费单词列表都可以。

完成后,使用该词汇表来构建后缀树,并将输入流与此匹配:http : //en.wikipedia.org/wiki/Suffix_tree

You need to identify your vocabulary – perhaps any free word list will do.

Once done, use that vocabulary to build a suffix tree, and match your stream of input against that: http://en.wikipedia.org/wiki/Suffix_tree


如何将字符串散列为8位数字?

问题:如何将字符串散列为8位数字?

无论如何,我可以将随机字符串散列为8位数字而无需自己实现任何算法吗?

Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?


回答 0

是的,您可以使用内置的hashlib模块或内置的hash函数。然后,对整数形式的哈希使用模运算或字符串切片运算来切掉最后八位数字:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

Yes, you can use the built-in hashlib modules or the built-in hash function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

回答 1

Raymond的答案对于python2非常有用(尽管您不需要abs()或10 ** 8附近的括号)。但是,对于python3,有一些重要的警告。首先,您需要确保传递编码的字符串。如今,在大多数情况下,最好避开sha-1并改用sha-256之类的东西。因此,hashlib方法将是:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

如果要改用hash()函数,则需要注意的重要一点是,与Python 2.x不同,在Python 3.x中,hash()的结果将仅在进程内保持一致,而在整个python调用之间均保持一致。看这里:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

这意味着建议使用基于hash()的解决方案,可以将其简化为:

hash(s) % 10**8

将仅在给定的脚本运行中返回相同的值:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

因此,取决于这是否在您的应用程序中很重要(它在我的应用程序中确实如此),您可能希望坚持使用基于hashlib的方法。

Raymond’s answer is great for python2 (though, you don’t need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you’ll need to make sure you are passing an encoded string. These days, in most circumstances, it’s probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

This means the hash()-based solution suggested, which can be shortened to just:

hash(s) % 10**8

will only return the same value within a given script run:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

So, depending on if this matters in your application (it did in mine), you’ll probably want to stick to the hashlib-based approach.


回答 2

只是为了完成JJC答案,在python 3.5.3中,如果您通过以下方式使用hashlib,则行为是正确的:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

回答 3

我将分享由@Raymond Hettinger实施的解决方案的nodejs实施。

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

获取数的所有除数的最佳方法是什么?

问题:获取数的所有除数的最佳方法是什么?

这是非常愚蠢的方式:

def divisorGenerator(n):
    for i in xrange(1,n/2+1):
        if n%i == 0: yield i
    yield n

我想要得到的结果与此类似,但是我想要一个更智能的算法(这个算法太慢而且太笨了:-)

我可以很快找到主要因素及其多样性。我有一个生成器以这种方式生成因子:

(因数1,多重性1)(因数
2,多重性2)
(因数3,多重性3)
等等…

即输出

for i in factorGenerator(100):
    print i

是:

(2, 2)
(5, 2)

我不知道这对我想做的事情有多大帮助(我为其他问题编写了代码),无论如何,我都希望有一种更聪明的制作方法

for i in divisorGen(100):
    print i

输出:

1
2
4
5
10
20
25
50
100

更新:非常感谢Greg Hewgill和他的“智能方式” :)计算100000000的所有除数,而用39s的方式计算了我的机器上愚蠢的方式花费了0.01s,这很酷:D

更新2:别说这是这篇文章的重复。计算给定数的除数无需计算所有除数。这是一个不同的问题,如果您认为不是这样,那么请在Wikipedia上查找“除数函数”。在发布之前,请先阅读问题和答案,如果您不明白主题是什么,请不要添加无用且已经给出答案的内容。

Here’s the very dumb way:

def divisorGenerator(n):
    for i in xrange(1,n/2+1):
        if n%i == 0: yield i
    yield n

The result I’d like to get is similar to this one, but I’d like a smarter algorithm (this one it’s too much slow and dumb :-)

I can find prime factors and their multiplicity fast enough. I’ve an generator that generates factor in this way:

(factor1, multiplicity1)
(factor2, multiplicity2)
(factor3, multiplicity3)
and so on…

i.e. the output of

for i in factorGenerator(100):
    print i

is:

(2, 2)
(5, 2)

I don’t know how much is this useful for what I want to do (I coded it for other problems), anyway I’d like a smarter way to make

for i in divisorGen(100):
    print i

output this:

1
2
4
5
10
20
25
50
100

UPDATE: Many thanks to Greg Hewgill and his “smart way” :) Calculating all divisors of 100000000 took 0.01s with his way against the 39s that the dumb way took on my machine, very cool :D

UPDATE 2: Stop saying this is a duplicate of this post. Calculating the number of divisor of a given number doesn’t need to calculate all the divisors. It’s a different problem, if you think it’s not then look for “Divisor function” on wikipedia. Read the questions and the answer before posting, if you do not understand what is the topic just don’t add not useful and already given answers.


回答 0

给定您的factorGenerator功能,这里divisorGen应该可以工作:

def divisorGen(n):
    factors = list(factorGenerator(n))
    nfactors = len(factors)
    f = [0] * nfactors
    while True:
        yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
        i = 0
        while True:
            f[i] += 1
            if f[i] <= factors[i][1]:
                break
            f[i] = 0
            i += 1
            if i >= nfactors:
                return

该算法的整体效率将完全取决于的效率factorGenerator

Given your factorGenerator function, here is a divisorGen that should work:

def divisorGen(n):
    factors = list(factorGenerator(n))
    nfactors = len(factors)
    f = [0] * nfactors
    while True:
        yield reduce(lambda x, y: x*y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
        i = 0
        while True:
            f[i] += 1
            if f[i] <= factors[i][1]:
                break
            f[i] = 0
            i += 1
            if i >= nfactors:
                return

The overall efficiency of this algorithm will depend entirely on the efficiency of the factorGenerator.


回答 1

要扩展Shimi所说的话,您应该只在1到n的平方根之间运行循环。然后找到对,执行n / i,这将覆盖整个问题空间。

还要指出的是,这是一个NP或“困难”的问题。穷举搜索(您正在执行的方式)与保证答案的效果差不多。加密算法等使用此事实来帮助保护它们。如果有人要解决这个问题,那么我们目前大多数的“安全”通信,即使不是全部,也会变得不安全。

Python代码:

import math

def divisorGenerator(n):
    large_divisors = []
    for i in xrange(1, int(math.sqrt(n) + 1)):
        if n % i == 0:
            yield i
            if i*i != n:
                large_divisors.append(n / i)
    for divisor in reversed(large_divisors):
        yield divisor

print list(divisorGenerator(100))

哪个应该输出类似以下的列表:

[1、2、4、5、10、20、25、50、100]

To expand on what Shimi has said, you should only be running your loop from 1 to the square root of n. Then to find the pair, do n / i, and this will cover the whole problem space.

As was also noted, this is a NP, or ‘difficult’ problem. Exhaustive search, the way you are doing it, is about as good as it gets for guaranteed answers. This fact is used by encryption algorithms and the like to help secure them. If someone were to solve this problem, most if not all of our current ‘secure’ communication would be rendered insecure.

Python code:

import math

def divisorGenerator(n):
    large_divisors = []
    for i in xrange(1, int(math.sqrt(n) + 1)):
        if n % i == 0:
            yield i
            if i*i != n:
                large_divisors.append(n / i)
    for divisor in reversed(large_divisors):
        yield divisor

print list(divisorGenerator(100))

Which should output a list like:

[1, 2, 4, 5, 10, 20, 25, 50, 100]

回答 2

尽管已经有很多解决方案,但我确实必须发布此内容:)

这是:

  • 可读的
  • 自包含,可复制并粘贴
  • 快速(在有很多主要因素和因数的情况下,比公认的解决方案快10倍以上)
  • 符合python3,python2和pypy

码:

def divisors(n):
    # get factors and their counts
    factors = {}
    nn = n
    i = 2
    while i*i <= nn:
        while nn % i == 0:
            factors[i] = factors.get(i, 0) + 1
            nn //= i
        i += 1
    if nn > 1:
        factors[nn] = factors.get(nn, 0) + 1

    primes = list(factors.keys())

    # generates factors from primes[k:] subset
    def generate(k):
        if k == len(primes):
            yield 1
        else:
            rest = generate(k+1)
            prime = primes[k]
            for factor in rest:
                prime_to_i = 1
                # prime_to_i iterates prime**i values, i being all possible exponents
                for _ in range(factors[prime] + 1):
                    yield factor * prime_to_i
                    prime_to_i *= prime

    # in python3, `yield from generate(0)` would also work
    for factor in generate(0):
        yield factor

Although there are already many solutions to this, I really have to post this :)

This one is:

  • readable
  • short
  • self contained, copy & paste ready
  • quick (in cases with a lot of prime factors and divisors, > 10 times faster than the accepted solution)
  • python3, python2 and pypy compliant

Code:

def divisors(n):
    # get factors and their counts
    factors = {}
    nn = n
    i = 2
    while i*i <= nn:
        while nn % i == 0:
            factors[i] = factors.get(i, 0) + 1
            nn //= i
        i += 1
    if nn > 1:
        factors[nn] = factors.get(nn, 0) + 1

    primes = list(factors.keys())

    # generates factors from primes[k:] subset
    def generate(k):
        if k == len(primes):
            yield 1
        else:
            rest = generate(k+1)
            prime = primes[k]
            for factor in rest:
                prime_to_i = 1
                # prime_to_i iterates prime**i values, i being all possible exponents
                for _ in range(factors[prime] + 1):
                    yield factor * prime_to_i
                    prime_to_i *= prime

    # in python3, `yield from generate(0)` would also work
    for factor in generate(0):
        yield factor

回答 3

我认为您可以停在,math.sqrt(n)而不是n / 2。

我会举一个例子,以便您容易理解。现在sqrt(28)5.29这样ceil(5.29)将为6所以我,如果我将在6停止那么我将可以得到所有的除数。怎么样?

首先查看代码,然后查看图片:

import math
def divisors(n):
    divs = [1]
    for i in xrange(2,int(math.sqrt(n))+1):
        if n%i == 0:
            divs.extend([i,n/i])
    divs.extend([n])
    return list(set(divs))

现在,请参见下图:

可以说我已经添加1到除数列表中,i=2所以我从

因此,在所有迭代的末尾,因为我将商和除数添加到列表中,所以填充了28的所有除数。

资料来源:如何确定数字的除数

I think you can stop at math.sqrt(n) instead of n/2.

I will give you example so that you can understand it easily. Now the sqrt(28) is 5.29 so ceil(5.29) will be 6. So I if I will stop at 6 then I will can get all the divisors. How?

First see the code and then see image:

import math
def divisors(n):
    divs = [1]
    for i in xrange(2,int(math.sqrt(n))+1):
        if n%i == 0:
            divs.extend([i,n/i])
    divs.extend([n])
    return list(set(divs))

Now, See the image below:

Lets say I have already added 1 to my divisors list and I start with i=2 so

So at the end of all the iterations as I have added the quotient and the divisor to my list all the divisors of 28 are populated.

Source: How to determine the divisors of a number


回答 4

我喜欢Greg解决方案,但我希望它更像python。我觉得它会更快,更易读。所以经过一段时间的编码后,我想到了这一点。

要创建列表的笛卡尔积,需要前两个功能。一旦出现此问题,便可以重复使用。顺便说一下,我必须自己编写程序,如果有人知道该问题的标准解决方案,请随时与我联系。

现在,“ Factorgenerator”将返回一个字典。然后将字典放入“除数”中,后者使用字典首先生成一个列表列表,其中每个列表都是具有p素数的p ^ n形式的因子的列表。然后,我们生成这些列表的笛卡尔乘积,最后使用Greg的解决方案生成除数。我们对它们进行排序,然后将其退回。

我测试了它,它似乎比以前的版本要快一些。我将它作为一个更大的程序的一部分进行了测试,所以我不能真正说出它快多少。

彼得罗·斯佩罗尼(Pietrosperoni点它)

from math import sqrt


##############################################################
### cartesian product of lists ##################################
##############################################################

def appendEs2Sequences(sequences,es):
    result=[]
    if not sequences:
        for e in es:
            result.append([e])
    else:
        for e in es:
            result+=[seq+[e] for seq in sequences]
    return result


def cartesianproduct(lists):
    """
    given a list of lists,
    returns all the possible combinations taking one element from each list
    The list does not have to be of equal length
    """
    return reduce(appendEs2Sequences,lists,[])

##############################################################
### prime factors of a natural ##################################
##############################################################

def primefactors(n):
    '''lists prime factors, from greatest to smallest'''  
    i = 2
    while i<=sqrt(n):
        if n%i==0:
            l = primefactors(n/i)
            l.append(i)
            return l
        i+=1
    return [n]      # n is prime


##############################################################
### factorization of a natural ##################################
##############################################################

def factorGenerator(n):
    p = primefactors(n)
    factors={}
    for p1 in p:
        try:
            factors[p1]+=1
        except KeyError:
            factors[p1]=1
    return factors

def divisors(n):
    factors = factorGenerator(n)
    divisors=[]
    listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
    listfactors=cartesianproduct(listexponents)
    for f in listfactors:
        divisors.append(reduce(lambda x, y: x*y, f, 1))
    divisors.sort()
    return divisors



print divisors(60668796879)

PS这是我第一次发布到stackoverflow。我期待任何反馈。

I like Greg solution, but I wish it was more python like. I feel it would be faster and more readable; so after some time of coding I came out with this.

The first two functions are needed to make the cartesian product of lists. And can be reused whnever this problem arises. By the way, I had to program this myself, if anyone knows of a standard solution for this problem, please feel free to contact me.

“Factorgenerator” now returns a dictionary. And then the dictionary is fed into “divisors”, who uses it to generate first a list of lists, where each list is the list of the factors of the form p^n with p prime. Then we make the cartesian product of those lists, and we finally use Greg’ solution to generate the divisor. We sort them, and return them.

I tested it and it seem to be a bit faster than the previous version. I tested it as part of a bigger program, so I can’t really say how much is it faster though.

Pietro Speroni (pietrosperoni dot it)

from math import sqrt


##############################################################
### cartesian product of lists ##################################
##############################################################

def appendEs2Sequences(sequences,es):
    result=[]
    if not sequences:
        for e in es:
            result.append([e])
    else:
        for e in es:
            result+=[seq+[e] for seq in sequences]
    return result


def cartesianproduct(lists):
    """
    given a list of lists,
    returns all the possible combinations taking one element from each list
    The list does not have to be of equal length
    """
    return reduce(appendEs2Sequences,lists,[])

##############################################################
### prime factors of a natural ##################################
##############################################################

def primefactors(n):
    '''lists prime factors, from greatest to smallest'''  
    i = 2
    while i<=sqrt(n):
        if n%i==0:
            l = primefactors(n/i)
            l.append(i)
            return l
        i+=1
    return [n]      # n is prime


##############################################################
### factorization of a natural ##################################
##############################################################

def factorGenerator(n):
    p = primefactors(n)
    factors={}
    for p1 in p:
        try:
            factors[p1]+=1
        except KeyError:
            factors[p1]=1
    return factors

def divisors(n):
    factors = factorGenerator(n)
    divisors=[]
    listexponents=[map(lambda x:k**x,range(0,factors[k]+1)) for k in factors.keys()]
    listfactors=cartesianproduct(listexponents)
    for f in listfactors:
        divisors.append(reduce(lambda x, y: x*y, f, 1))
    divisors.sort()
    return divisors



print divisors(60668796879)

P.S. it is the first time I am posting to stackoverflow. I am looking forward for any feedback.


回答 5

这是在纯Python 3.6中对10 ** 16左右的数字进行处理的一种智能,快速的方法,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

Here is a smart and fast way to do it for numbers up to and around 10**16 in pure Python 3.6,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

回答 6

改编自CodeReview,这是一个与num=1!一起使用的变体!

from itertools import product
import operator

def prod(ls):
   return reduce(operator.mul, ls, 1)

def powered(factors, powers):
   return prod(f**p for (f,p) in zip(factors, powers))


def divisors(num) :

   pf = dict(prime_factors(num))
   primes = pf.keys()
   #For each prime, possible exponents
   exponents = [range(i+1) for i in pf.values()]
   return (powered(primes,es) for es in product(*exponents))

Adapted from CodeReview, here is a variant which works with num=1 !

from itertools import product
import operator

def prod(ls):
   return reduce(operator.mul, ls, 1)

def powered(factors, powers):
   return prod(f**p for (f,p) in zip(factors, powers))


def divisors(num) :

   pf = dict(prime_factors(num))
   primes = pf.keys()
   #For each prime, possible exponents
   exponents = [range(i+1) for i in pf.values()]
   return (powered(primes,es) for es in product(*exponents))

回答 7

我将添加一个稍微修改过的Anivarth版本(因为我认为它是最Python的)以供将来参考。

from math import sqrt

def divisors(n):
    divs = {1,n}
    for i in range(2,int(sqrt(n))+1):
        if n%i == 0:
            divs.update((i,n//i))
    return divs

I’m just going to add a slightly revised version of Anivarth’s (as I believe it’s the most pythonic) for future reference.

from math import sqrt

def divisors(n):
    divs = {1,n}
    for i in range(2,int(sqrt(n))+1):
        if n%i == 0:
            divs.update((i,n//i))
    return divs

回答 8

旧问题,但这是我的看法:

def divs(n, m):
    if m == 1: return [1]
    if n % m == 0: return [m] + divs(n, m - 1)
    return divs(n, m - 1)

您可以代理:

def divisorGenerator(n):
    for x in reversed(divs(n, n)):
        yield x

注意:对于支持的语言,这可能是尾递归。

Old question, but here is my take:

def divs(n, m):
    if m == 1: return [1]
    if n % m == 0: return [m] + divs(n, m - 1)
    return divs(n, m - 1)

You can proxy with:

def divisorGenerator(n):
    for x in reversed(divs(n, n)):
        yield x

NOTE: For languages that support, this could be tail recursive.


回答 9

假设factors函数返回n的因数(例如,factors(60)返回列表[2,2,3,5]),这是一个计算n除数的函数

function divisors(n)
    divs := [1]
    for fact in factors(n)
        temp := []
        for div in divs
            if fact * div not in divs
                append fact * div to temp
        divs := divs + temp
    return divs

Assuming that the factors function returns the factors of n (for instance, factors(60) returns the list [2, 2, 3, 5]), here is a function to compute the divisors of n:

function divisors(n)
    divs := [1]
    for fact in factors(n)
        temp := []
        for div in divs
            if fact * div not in divs
                append fact * div to temp
        divs := divs + temp
    return divs

回答 10

这是我的解决方案。它似乎很愚蠢,但效果很好…而且我试图找到所有合适的除数,所以循环从i = 2开始。

import math as m 

def findfac(n):
    faclist = [1]
    for i in range(2, int(m.sqrt(n) + 2)):
        if n%i == 0:
            if i not in faclist:
                faclist.append(i)
                if n/i not in faclist:
                    faclist.append(n/i)
    return facts

Here’s my solution. It seems to be dumb but works well…and I was trying to find all proper divisors so the loop started from i = 2.

import math as m 

def findfac(n):
    faclist = [1]
    for i in range(2, int(m.sqrt(n) + 2)):
        if n%i == 0:
            if i not in faclist:
                faclist.append(i)
                if n/i not in faclist:
                    faclist.append(n/i)
    return facts

回答 11

如果您只在乎使用列表推导,对您而言别无其他!

from itertools import combinations
from functools import reduce

def get_devisors(n):
    f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
    fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
    devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
    return sorted(devisors)

If you only care about using list comprehensions and nothing else matters to you!

from itertools import combinations
from functools import reduce

def get_devisors(n):
    f = [f for f,e in list(factorGenerator(n)) for i in range(e)]
    fc = [x for l in range(len(f)+1) for x in combinations(f, l)]
    devisors = [1 if c==() else reduce((lambda x, y: x * y), c) for c in set(fc)]
    return sorted(devisors)

回答 12

如果您的PC拥有大量内存,那么使用numpy可以使单个行足够快:

N = 10000000; tst = np.arange(1, N); tst[np.mod(N, tst) == 0]
Out: 
array([      1,       2,       4,       5,       8,      10,      16,
            20,      25,      32,      40,      50,      64,      80,
           100,     125,     128,     160,     200,     250,     320,
           400,     500,     625,     640,     800,    1000,    1250,
          1600,    2000,    2500,    3125,    3200,    4000,    5000,
          6250,    8000,   10000,   12500,   15625,   16000,   20000,
         25000,   31250,   40000,   50000,   62500,   78125,   80000,
        100000,  125000,  156250,  200000,  250000,  312500,  400000,
        500000,  625000, 1000000, 1250000, 2000000, 2500000, 5000000])

在我的慢速PC上花费不到1秒。

If your PC has tons of memory, a brute single line can be fast enough with numpy:

N = 10000000; tst = np.arange(1, N); tst[np.mod(N, tst) == 0]
Out: 
array([      1,       2,       4,       5,       8,      10,      16,
            20,      25,      32,      40,      50,      64,      80,
           100,     125,     128,     160,     200,     250,     320,
           400,     500,     625,     640,     800,    1000,    1250,
          1600,    2000,    2500,    3125,    3200,    4000,    5000,
          6250,    8000,   10000,   12500,   15625,   16000,   20000,
         25000,   31250,   40000,   50000,   62500,   78125,   80000,
        100000,  125000,  156250,  200000,  250000,  312500,  400000,
        500000,  625000, 1000000, 1250000, 2000000, 2500000, 5000000])

Takes less than 1s on my slow PC.


回答 13

我通过生成器函数的解决方案是:

def divisor(num):
    for x in range(1, num + 1):
        if num % x == 0:
            yield x
    while True:
        yield None

My solution via generator function is:

def divisor(num):
    for x in range(1, num + 1):
        if num % x == 0:
            yield x
    while True:
        yield None

回答 14

return [x for x in range(n+1) if n/x==int(n/x)]
return [x for x in range(n+1) if n/x==int(n/x)]

回答 15

对我来说,这很好,也很干净(Python 3)

def divisors(number):
    n = 1
    while(n<number):
        if(number%n==0):
            print(n)
        else:
            pass
        n += 1
    print(number)

速度不是很快,但是可以按需逐行返回除数,如果您确实想要,也可以执行list.append(n)和list.append(number)

For me this works fine and is also clean (Python 3)

def divisors(number):
    n = 1
    while(n<number):
        if(number%n==0):
            print(n)
        else:
            pass
        n += 1
    print(number)

Not very fast but returns divisors line by line as you wanted, also you can do list.append(n) and list.append(number) if you really want to


检查列表中的所有元素是否唯一

问题:检查列表中的所有元素是否唯一

检查列表中所有元素是否唯一的最佳方法(与传统方法一样最佳)是什么?

我目前使用的方法Counter是:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

我可以做得更好吗?

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?


回答 0

不是最高效的,而是简单明了的:

if len(x) > len(set(x)):
   pass # do something

短名单可能不会有太大的不同。

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won’t make much of a difference for short lists.


回答 1

这里有两个班轮,它们也会提前退出:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

如果x的元素不可散列,那么您将不得不使用以下列表seen

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren’t hashable, then you’ll have to resort to using a list for seen:

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

回答 2

提前退出的解决方案可能是

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

但是对于小情况或如果提早退出并不常见,那么我希望len(x) != len(set(x))这是最快的方法。

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.


回答 3

为了速度:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

回答 4

如何将所有条目添加到集合中并检查其长度呢?

len(set(x)) == len(x)

How about adding all the entries to a set and checking its length?

len(set(x)) == len(x)

回答 5

替代a set,您可以使用dict

len({}.fromkeys(x)) == len(x)

Alternative to a set, you can use a dict.

len({}.fromkeys(x)) == len(x)

回答 6

完全使用排序和分组方式的另一种方法:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

它需要排序,但是在第一个重复值上退出。

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.


回答 7

这是一个有趣的递归O(N 2)版本:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Here is a recursive O(N2) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

回答 8

这是递归的提前退出函数:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

对于我来说,它足够快,而无需使用怪异的(慢速)转换,同时具有功能样式的方法。

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It’s fast enough for me without using weird(slow) conversions while having a functional-style approach.


回答 9

这个怎么样

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

回答 10

您可以使用Yan的语法(len(x)> len(set(x))),但可以定义一个函数来代替set(x):

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

并做len(x)> len(f5(x))。这样会很快,而且还能保留订单。

此处的代码来自:http//www.peterbe.com/plog/uniqifiers-benchmark

You can use Yan’s syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark


回答 11

在Pandas数据框中使用类似的方法来测试列的内容是否包含唯一值:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

对我来说,这在包含一百万行的日期框架中的int变量上是瞬时的。

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.


回答 12

以上所有答案都很好,但我更喜欢使用30秒内的Pythonall_unique示例

您需要set()在给定列表上使用来删除重复项,并将其长度与列表的长度进行比较。

def all_unique(lst):
  return len(lst) == len(set(lst))

True如果平面列表中的所有值均为unique,则返回,False否则返回

x = [1,2,3,4,5,6]
y = [1,2,2,3,4,5]
all_unique(x) # True
all_unique(y) # False

all answer above are good but i prefer to use all_unique example from 30 seconds of python

you need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

it returns True if all the values in a flat list are unique, False otherwise

x = [1,2,3,4,5,6]
y = [1,2,2,3,4,5]
all_unique(x) # True
all_unique(y) # False

回答 13

对于初学者:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

关于Python的内置sort()方法

问题:关于Python的内置sort()方法

sort()Python 的内置方法使用什么算法?可以看看该方法的代码吗?

What algorithm is the built in sort() method in Python using? Is it possible to have a look at the code for that method?


回答 0

当然!代码在这里,从函数开始,islt进行相当一段时间;-)。就像克里斯的评论所暗示的那样,它是C代码。您还需要阅读此文本文件,以获得文本说明,结果等。

如果您喜欢阅读Java代码而不是C代码,则可以看看Joshua Bloch在Java中和Java中实现timsort(Joshua也是在1997年实现了仍在Java中使用的经过修改的mergesort的人,并且人们希望Java将最终改用他最近的timsort港口)。

timsort的Java端口的一些说明在这里,diff在这里(带有指向所有需要的文件的指针),关键文件在这里 -FWIW,尽管我是比Java程序员更好的C程序员,在这种情况下,我发现Joshua的Java代码比Tim的C代码更具可读性;-)。

Sure! The code’s here, starting with function islt and proceeding for QUITE a while;-). As Chris’s comment suggests, it’s C code. You’ll also want to read this text file for a textual explanation, results, etc etc.

If you prefer reading Java code than C code, you could look at Joshua Bloch’s implementation of timsort in and for Java (Joshua’s also the guy who implemented, in 1997, the modified mergesort that’s still used in Java, and one can hope that Java will eventually switch to his recent port of timsort).

Some explanation of the Java port of timsort is here, the diff is here (with pointers to all needed files), the key file is here — FWIW, while I’m a better C programmer than Java programmer, in this case I find Joshua’s Java code more readable overall than Tim’s C code;-).


回答 1

我只是想提供一个非常有用的链接,而我在Alex的其他全面答案中却没有找到它:对Python timsort的高级解释(带有图形可视化!)。

(是的,该算法现在基本上称为Timsort

I just wanted to supply a very helpful link that I missed in Alex’s otherwise comprehensive answer: A high-level explanation of Python’s timsort (with graph visualizations!).

(Yes, the algorithm is basically known as Timsort now)


回答 2

在早期的python版本中,sort函数实现了quicksort的修改版本。但是,它被认为是不稳定的,从2.3版本开始,他们改用自适应合并排序算法。

In early python-versions, the sort function implemented a modified version of quicksort. However, it was deemed unstable and as of 2.3 they switched to using an adaptive mergesort algorithm.


如何实现二叉树?

问题:如何实现二叉树?

哪种最佳数据结构可用于在Python中实现二叉树?

Which is the best data structure that can be used to implement a binary tree in Python?


回答 0

这是二进制搜索树的简单递归实现。

#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if self.root is None:
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if val < node.v:
            if node.l is not None:
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if node.r is not None:
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if self.root is not None:
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if val == node.v:
            return node
        elif (val < node.v and node.l is not None):
            self._find(val, node.l)
        elif (val > node.v and node.r is not None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if self.root is not None:
            self._printTree(self.root)

    def _printTree(self, node):
        if node is not None:
            self._printTree(node.l)
            print(str(node.v) + ' ')
            self._printTree(node.r)

#     3
# 0     4
#   2      8
tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print(tree.find(3).v)
print(tree.find(10))
tree.deleteTree()
tree.printTree()

Here is my simple recursive implementation of binary search tree.

#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if self.root is None:
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if val < node.v:
            if node.l is not None:
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if node.r is not None:
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if self.root is not None:
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if val == node.v:
            return node
        elif (val < node.v and node.l is not None):
            self._find(val, node.l)
        elif (val > node.v and node.r is not None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if self.root is not None:
            self._printTree(self.root)

    def _printTree(self, node):
        if node is not None:
            self._printTree(node.l)
            print(str(node.v) + ' ')
            self._printTree(node.r)

#     3
# 0     4
#   2      8
tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print(tree.find(3).v)
print(tree.find(10))
tree.deleteTree()
tree.printTree()

回答 1

# simple binary tree
# in this implementation, a node is inserted between an existing node and the root


class BinaryTree():

    def __init__(self,rootid):
      self.left = None
      self.right = None
      self.rootid = rootid

    def getLeftChild(self):
        return self.left
    def getRightChild(self):
        return self.right
    def setNodeValue(self,value):
        self.rootid = value
    def getNodeValue(self):
        return self.rootid

    def insertRight(self,newNode):
        if self.right == None:
            self.right = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.right = self.right
            self.right = tree

    def insertLeft(self,newNode):
        if self.left == None:
            self.left = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.left = self.left
            self.left = tree


def printTree(tree):
        if tree != None:
            printTree(tree.getLeftChild())
            print(tree.getNodeValue())
            printTree(tree.getRightChild())



# test tree

def testTree():
    myTree = BinaryTree("Maud")
    myTree.insertLeft("Bob")
    myTree.insertRight("Tony")
    myTree.insertRight("Steven")
    printTree(myTree)

在此处了解更多信息:-这是一个非常简单的实现二进制树。

是一个很好的教程,中间有问题

# simple binary tree
# in this implementation, a node is inserted between an existing node and the root


class BinaryTree():

    def __init__(self,rootid):
      self.left = None
      self.right = None
      self.rootid = rootid

    def getLeftChild(self):
        return self.left
    def getRightChild(self):
        return self.right
    def setNodeValue(self,value):
        self.rootid = value
    def getNodeValue(self):
        return self.rootid

    def insertRight(self,newNode):
        if self.right == None:
            self.right = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.right = self.right
            self.right = tree

    def insertLeft(self,newNode):
        if self.left == None:
            self.left = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.left = self.left
            self.left = tree


def printTree(tree):
        if tree != None:
            printTree(tree.getLeftChild())
            print(tree.getNodeValue())
            printTree(tree.getRightChild())



# test tree

def testTree():
    myTree = BinaryTree("Maud")
    myTree.insertLeft("Bob")
    myTree.insertRight("Tony")
    myTree.insertRight("Steven")
    printTree(myTree)

Read more about it Here:-This is a very simple implementation of a binary tree.

This is a nice tutorial with questions in between


回答 2

[采访所需的内容] Node类是足以表示二叉树的数据结构。

(尽管其他答案大多数都是正确的,但对于二叉树而言,它们不是必需的:无需扩展对象类,无需成为BST,无需导入双端队列)。

class Node:

    def __init__(self, value = None):
        self.left  = None
        self.right = None
        self.value = value

这是一棵树的例子:

n1 = Node(1)
n2 = Node(2)
n3 = Node(3)
n1.left  = n2
n1.right = n3

在此示例中,n1是具有n2,n3作为其子级的树的根。

[What you need for interviews] A Node class is the sufficient data structure to represent a binary tree.

(While other answers are mostly correct, they are not required for a binary tree: no need to extend object class, no need to be a BST, no need to import deque).

class Node:

    def __init__(self, value = None):
        self.left  = None
        self.right = None
        self.value = value

Here is an example of a tree:

n1 = Node(1)
n2 = Node(2)
n3 = Node(3)
n1.left  = n2
n1.right = n3

In this example n1 is the root of the tree having n2, n3 as its children.


回答 3

BST在Python中的简单实现

class TreeNode:
    def __init__(self, value):
        self.left = None
        self.right = None
        self.data = value

class Tree:
    def __init__(self):
        self.root = None

    def addNode(self, node, value):
        if(node==None):
            self.root = TreeNode(value)
        else:
            if(value<node.data):
                if(node.left==None):
                    node.left = TreeNode(value)
                else:
                    self.addNode(node.left, value)
            else:
                if(node.right==None):
                    node.right = TreeNode(value)
                else:
                    self.addNode(node.right, value)

    def printInorder(self, node):
        if(node!=None):
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)

def main():
    testTree = Tree()
    testTree.addNode(testTree.root, 200)
    testTree.addNode(testTree.root, 300)
    testTree.addNode(testTree.root, 100)
    testTree.addNode(testTree.root, 30)
    testTree.printInorder(testTree.root)

Simple implementation of BST in Python

class TreeNode:
    def __init__(self, value):
        self.left = None
        self.right = None
        self.data = value

class Tree:
    def __init__(self):
        self.root = None

    def addNode(self, node, value):
        if(node==None):
            self.root = TreeNode(value)
        else:
            if(value<node.data):
                if(node.left==None):
                    node.left = TreeNode(value)
                else:
                    self.addNode(node.left, value)
            else:
                if(node.right==None):
                    node.right = TreeNode(value)
                else:
                    self.addNode(node.right, value)

    def printInorder(self, node):
        if(node!=None):
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)

def main():
    testTree = Tree()
    testTree.addNode(testTree.root, 200)
    testTree.addNode(testTree.root, 300)
    testTree.addNode(testTree.root, 100)
    testTree.addNode(testTree.root, 30)
    testTree.printInorder(testTree.root)

回答 4

使用列表实现二叉树的一种非常快捷的方法。这不是最有效的方法,也不能很好地处理nil值。但这非常透明(至少对我而言):

def _add(node, v):
    new = [v, [], []]
    if node:
        left, right = node[1:]
        if not left:
            left.extend(new)
        elif not right:
            right.extend(new)
        else:
            _add(left, v)
    else:
        node.extend(new)

def binary_tree(s):
    root = []
    for e in s:
        _add(root, e)
    return root

def traverse(n, order):
    if n:
        v = n[0]
        if order == 'pre':
            yield v
        for left in traverse(n[1], order):
            yield left
        if order == 'in':
            yield v
        for right in traverse(n[2], order):
            yield right
        if order == 'post':
            yield v

从可迭代构造树:

 >>> tree = binary_tree('A B C D E'.split())
 >>> print tree
 ['A', ['B', ['D', [], []], ['E', [], []]], ['C', [], []]]

遍历一棵树:

 >>> list(traverse(tree, 'pre')), list(traverse(tree, 'in')), list(traverse(tree, 'post'))
 (['A', 'B', 'D', 'E', 'C'],
  ['D', 'B', 'E', 'A', 'C'],
  ['D', 'E', 'B', 'C', 'A'])

A very quick ‘n dirty way of implementing a binary tree using lists. Not the most efficient, nor does it handle nil values all too well. But it’s very transparent (at least to me):

def _add(node, v):
    new = [v, [], []]
    if node:
        left, right = node[1:]
        if not left:
            left.extend(new)
        elif not right:
            right.extend(new)
        else:
            _add(left, v)
    else:
        node.extend(new)

def binary_tree(s):
    root = []
    for e in s:
        _add(root, e)
    return root

def traverse(n, order):
    if n:
        v = n[0]
        if order == 'pre':
            yield v
        for left in traverse(n[1], order):
            yield left
        if order == 'in':
            yield v
        for right in traverse(n[2], order):
            yield right
        if order == 'post':
            yield v

Constructing a tree from an iterable:

 >>> tree = binary_tree('A B C D E'.split())
 >>> print tree
 ['A', ['B', ['D', [], []], ['E', [], []]], ['C', [], []]]

Traversing a tree:

 >>> list(traverse(tree, 'pre')), list(traverse(tree, 'in')), list(traverse(tree, 'post'))
 (['A', 'B', 'D', 'E', 'C'],
  ['D', 'B', 'E', 'A', 'C'],
  ['D', 'E', 'B', 'C', 'A'])

回答 5

我不禁注意到这里的大多数答案都在实现二进制搜索树。二进制搜索树!=二进制树。

  • 二叉搜索树具有非常特殊的属性:对于任何节点X,X的密钥都大于其左子节点的任何后代的关键字,并且小于其右子节点的任何后代的关键字。

  • 二叉树不施加这样的限制。二叉树只是具有“键”元素和两个孩子的数据结构,分别是“左”和“右”。

  • 树是二进制树的更一般的情况,其中每个节点可以具有任意数量的子代。通常,每个节点都有一个“孩子”元素,其类型为列表/数组。

现在,为了回答OP的问题,我将在Python中包含Binary Tree的完整实现。给定它提供最佳的O(1)查找,存储每个BinaryTreeNode的基础数据结构是一个字典。我还实现了深度优先遍历和深度优先遍历。这些是在树上执行的非常常见的操作。

from collections import deque

class BinaryTreeNode:
    def __init__(self, key, left=None, right=None):
        self.key = key
        self.left = left
        self.right = right

    def __repr__(self):
        return "%s l: (%s) r: (%s)" % (self.key, self.left, self.right)

    def __eq__(self, other):
        if self.key == other.key and \
            self.right == other.right and \
                self.left == other.left:
            return True
        else:
            return False

class BinaryTree:
    def __init__(self, root_key=None):
        # maps from BinaryTreeNode key to BinaryTreeNode instance.
        # Thus, BinaryTreeNode keys must be unique.
        self.nodes = {}
        if root_key is not None:
            # create a root BinaryTreeNode
            self.root = BinaryTreeNode(root_key)
            self.nodes[root_key] = self.root

    def add(self, key, left_key=None, right_key=None):
        if key not in self.nodes:
            # BinaryTreeNode with given key does not exist, create it
            self.nodes[key] = BinaryTreeNode(key)
        # invariant: self.nodes[key] exists

        # handle left child
        if left_key is None:
            self.nodes[key].left = None
        else:
            if left_key not in self.nodes:
                self.nodes[left_key] = BinaryTreeNode(left_key)
            # invariant: self.nodes[left_key] exists
            self.nodes[key].left = self.nodes[left_key]

        # handle right child
        if right_key == None:
            self.nodes[key].right = None
        else:
            if right_key not in self.nodes:
                self.nodes[right_key] = BinaryTreeNode(right_key)
            # invariant: self.nodes[right_key] exists
            self.nodes[key].right = self.nodes[right_key]

    def remove(self, key):
        if key not in self.nodes:
            raise ValueError('%s not in tree' % key)
        # remove key from the list of nodes
        del self.nodes[key]
        # if node removed is left/right child, update parent node
        for k in self.nodes:
            if self.nodes[k].left and self.nodes[k].left.key == key:
                self.nodes[k].left = None
            if self.nodes[k].right and self.nodes[k].right.key == key:
                self.nodes[k].right = None
        return True

    def _height(self, node):
        if node is None:
            return 0
        else:
            return 1 + max(self._height(node.left), self._height(node.right))

    def height(self):
        return self._height(self.root)

    def size(self):
        return len(self.nodes)

    def __repr__(self):
        return str(self.traverse_inorder(self.root))

    def bfs(self, node):
        if not node or node not in self.nodes:
            return
        reachable = []    
        q = deque()
        # add starting node to queue
        q.append(node)
        while len(q):
            visit = q.popleft()
            # add currently visited BinaryTreeNode to list
            reachable.append(visit)
            # add left/right children as needed
            if visit.left:
                q.append(visit.left)
            if visit.right:
                q.append(visit.right)
        return reachable

    # visit left child, root, then right child
    def traverse_inorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_inorder(node.left, reachable)
        reachable.append(node.key)
        self.traverse_inorder(node.right, reachable)
        return reachable

    # visit left and right children, then root
    # root of tree is always last to be visited
    def traverse_postorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_postorder(node.left, reachable)
        self.traverse_postorder(node.right, reachable)
        reachable.append(node.key)
        return reachable

    # visit root, left, then right children
    # root is always visited first
    def traverse_preorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        reachable.append(node.key)
        self.traverse_preorder(node.left, reachable)
        self.traverse_preorder(node.right, reachable)
        return reachable

I can’t help but notice that most answers here are implementing a Binary Search Tree. Binary Search Tree != Binary Tree.

  • A Binary Search Tree has a very specific property: for any node X, X’s key is larger than the key of any descendent of its left child, and smaller than the key of any descendant of its right child.

  • A Binary Tree imposes no such restriction. A Binary Tree is simply a data structure with a ‘key’ element, and two children, say ‘left’ and ‘right’.

  • A Tree is an even more general case of a Binary Tree where each node can have an arbitrary number of children. Typically, each node has a ‘children’ element which is of type list/array.

Now, to answer the OP’s question, I am including a full implementation of a Binary Tree in Python. The underlying data structure storing each BinaryTreeNode is a dictionary, given it offers optimal O(1) lookups. I’ve also implemented depth-first and breadth-first traversals. These are very common operations performed on trees.

from collections import deque

class BinaryTreeNode:
    def __init__(self, key, left=None, right=None):
        self.key = key
        self.left = left
        self.right = right

    def __repr__(self):
        return "%s l: (%s) r: (%s)" % (self.key, self.left, self.right)

    def __eq__(self, other):
        if self.key == other.key and \
            self.right == other.right and \
                self.left == other.left:
            return True
        else:
            return False

class BinaryTree:
    def __init__(self, root_key=None):
        # maps from BinaryTreeNode key to BinaryTreeNode instance.
        # Thus, BinaryTreeNode keys must be unique.
        self.nodes = {}
        if root_key is not None:
            # create a root BinaryTreeNode
            self.root = BinaryTreeNode(root_key)
            self.nodes[root_key] = self.root

    def add(self, key, left_key=None, right_key=None):
        if key not in self.nodes:
            # BinaryTreeNode with given key does not exist, create it
            self.nodes[key] = BinaryTreeNode(key)
        # invariant: self.nodes[key] exists

        # handle left child
        if left_key is None:
            self.nodes[key].left = None
        else:
            if left_key not in self.nodes:
                self.nodes[left_key] = BinaryTreeNode(left_key)
            # invariant: self.nodes[left_key] exists
            self.nodes[key].left = self.nodes[left_key]

        # handle right child
        if right_key == None:
            self.nodes[key].right = None
        else:
            if right_key not in self.nodes:
                self.nodes[right_key] = BinaryTreeNode(right_key)
            # invariant: self.nodes[right_key] exists
            self.nodes[key].right = self.nodes[right_key]

    def remove(self, key):
        if key not in self.nodes:
            raise ValueError('%s not in tree' % key)
        # remove key from the list of nodes
        del self.nodes[key]
        # if node removed is left/right child, update parent node
        for k in self.nodes:
            if self.nodes[k].left and self.nodes[k].left.key == key:
                self.nodes[k].left = None
            if self.nodes[k].right and self.nodes[k].right.key == key:
                self.nodes[k].right = None
        return True

    def _height(self, node):
        if node is None:
            return 0
        else:
            return 1 + max(self._height(node.left), self._height(node.right))

    def height(self):
        return self._height(self.root)

    def size(self):
        return len(self.nodes)

    def __repr__(self):
        return str(self.traverse_inorder(self.root))

    def bfs(self, node):
        if not node or node not in self.nodes:
            return
        reachable = []    
        q = deque()
        # add starting node to queue
        q.append(node)
        while len(q):
            visit = q.popleft()
            # add currently visited BinaryTreeNode to list
            reachable.append(visit)
            # add left/right children as needed
            if visit.left:
                q.append(visit.left)
            if visit.right:
                q.append(visit.right)
        return reachable

    # visit left child, root, then right child
    def traverse_inorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_inorder(node.left, reachable)
        reachable.append(node.key)
        self.traverse_inorder(node.right, reachable)
        return reachable

    # visit left and right children, then root
    # root of tree is always last to be visited
    def traverse_postorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_postorder(node.left, reachable)
        self.traverse_postorder(node.right, reachable)
        reachable.append(node.key)
        return reachable

    # visit root, left, then right children
    # root is always visited first
    def traverse_preorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        reachable.append(node.key)
        self.traverse_preorder(node.left, reachable)
        self.traverse_preorder(node.right, reachable)
        return reachable

回答 6

你不需要两节课

class Tree:
    val = None
    left = None
    right = None

    def __init__(self, val):
        self.val = val


    def insert(self, val):
        if self.val is not None:
            if val < self.val:
                if self.left is not None:
                    self.left.insert(val)
                else:
                    self.left = Tree(val)
            elif val > self.val:
                if self.right is not None:
                    self.right.insert(val)
                else:
                    self.right = Tree(val)
            else:
                return
        else:
            self.val = val
            print("new node added")

    def showTree(self):
        if self.left is not None:
            self.left.showTree()
        print(self.val, end = ' ')
        if self.right is not None:
            self.right.showTree()

you don’t need to have two classes

class Tree:
    val = None
    left = None
    right = None

    def __init__(self, val):
        self.val = val


    def insert(self, val):
        if self.val is not None:
            if val < self.val:
                if self.left is not None:
                    self.left.insert(val)
                else:
                    self.left = Tree(val)
            elif val > self.val:
                if self.right is not None:
                    self.right.insert(val)
                else:
                    self.right = Tree(val)
            else:
                return
        else:
            self.val = val
            print("new node added")

    def showTree(self):
        if self.left is not None:
            self.left.showTree()
        print(self.val, end = ' ')
        if self.right is not None:
            self.right.showTree()

回答 7

多一点“ Pythonic”?

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def __repr__(self):
        return str(self.value)



class BST:
    def __init__(self):
        self.root = None

    def __repr__(self):
        self.sorted = []
        self.get_inorder(self.root)
        return str(self.sorted)

    def get_inorder(self, node):
        if node:
            self.get_inorder(node.left)
            self.sorted.append(str(node.value))
            self.get_inorder(node.right)

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self._add(self.root, value)

    def _add(self, node, value):
        if value <= node.value:
            if node.left:
                self._add(node.left, value)
            else:
                node.left = Node(value)
        else:
            if node.right:
                self._add(node.right, value)
            else:
                node.right = Node(value)



from random import randint

bst = BST()

for i in range(100):
    bst.add(randint(1, 1000))
print (bst)

A little more “Pythonic” ?

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def __repr__(self):
        return str(self.value)



class BST:
    def __init__(self):
        self.root = None

    def __repr__(self):
        self.sorted = []
        self.get_inorder(self.root)
        return str(self.sorted)

    def get_inorder(self, node):
        if node:
            self.get_inorder(node.left)
            self.sorted.append(str(node.value))
            self.get_inorder(node.right)

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self._add(self.root, value)

    def _add(self, node, value):
        if value <= node.value:
            if node.left:
                self._add(node.left, value)
            else:
                node.left = Node(value)
        else:
            if node.right:
                self._add(node.right, value)
            else:
                node.right = Node(value)



from random import randint

bst = BST()

for i in range(100):
    bst.add(randint(1, 1000))
print (bst)

回答 8

#!/usr/bin/python

class BinaryTree:
    def __init__(self, left, right, data):
        self.left = left
        self.right = right
        self.data = data


    def pre_order_traversal(root):
        print(root.data, end=' ')

        if root.left != None:
            pre_order_traversal(root.left)

        if root.right != None:
            pre_order_traversal(root.right)

    def in_order_traversal(root):
        if root.left != None:
            in_order_traversal(root.left)
        print(root.data, end=' ')
        if root.right != None:
            in_order_traversal(root.right)

    def post_order_traversal(root):
        if root.left != None:
            post_order_traversal(root.left)
        if root.right != None:
            post_order_traversal(root.right)
        print(root.data, end=' ')
#!/usr/bin/python

class BinaryTree:
    def __init__(self, left, right, data):
        self.left = left
        self.right = right
        self.data = data


    def pre_order_traversal(root):
        print(root.data, end=' ')

        if root.left != None:
            pre_order_traversal(root.left)

        if root.right != None:
            pre_order_traversal(root.right)

    def in_order_traversal(root):
        if root.left != None:
            in_order_traversal(root.left)
        print(root.data, end=' ')
        if root.right != None:
            in_order_traversal(root.right)

    def post_order_traversal(root):
        if root.left != None:
            post_order_traversal(root.left)
        if root.right != None:
            post_order_traversal(root.right)
        print(root.data, end=' ')

回答 9

一个 Node基类连接的节点的是一个标准的做法。这些可能很难想象。

摘自有关Python模式-实现图形文章,请考虑一个简单的字典:

给定

二叉树

               a
              / \
             b   c
            / \   \
           d   e   f

制作一个唯一节点的字典:

tree = {
   "a": ["b", "c"],
   "b": ["d", "e"],
   "c": [None, "f"],
   "d": [None, None],
   "e": [None, None],
   "f": [None, None],
}

细节

  • 每个键值对都是一个唯一的节点指向其子级。
  • 列表(或元组)包含一对有序的左/右子级。
  • 有命令命令插入字典,假定第一个条目为根。
  • 常用方法可以是使dict变异或遍历的函数(请参阅参考资料find_all_paths())。

基于树的功能通常包括以下常见操作:

  • 遍历:以给定的顺序产生每个节点(通常从左到右)
    • 广度优先搜索(BFS):遍历级别
    • 深度优先搜索(DFS):先进行遍历分支(前/后/后顺序)
  • insert:根据子节点数将节点添加到树中
  • remove:根据子节点数删除节点
  • 更新:将丢失的节点从一棵树合并到另一棵树
  • visit:得出遍历节点的值

尝试实施所有这些操作。在这里,我们演示这些功能之一 -BFS遍历:

import collections as ct


def traverse(tree):
    """Yield nodes from a tree via BFS."""
    q = ct.deque()                                         # 1
    root = next(iter(tree))                                # 2
    q.append(root)

    while q:
        node = q.popleft()
        children = filter(None, tree.get(node))
        for n in children:                                 # 3 
            q.append(n)
        yield node

list(traverse(tree))
# ['a', 'b', 'c', 'd', 'e', 'f']

这是应用于节点和子字典的广度优先搜索(级别顺序)算法

  1. 初始化FIFO队列。我们使用deque,但使用queuelist作品(后者效率低下)。
  2. 获取并排队根节点(假设根是字典中的第一个条目,Python 3.6+)。
  3. 迭代地使一个节点出队,使其子节点入队并产生节点值。

另请参阅此有关树的深入教程


洞察力

一般而言,遍历有很多好处,我们只需将队列替换为堆栈即可轻松地将后者的迭代方法更改为深度优先搜索(DFS)(即LIFO队列))。这仅表示我们从排队的同一侧出队。DFS允许我们搜索每个分支。

怎么样?由于我们使用deque,我们可以通过更改node = q.popleft()node = q.pop()(右)来模拟堆栈。结果是正确的,预购的DFS['a', 'c', 'f', 'b', 'e', 'd']

A Node-based class of connected nodes is a standard approach. These can be hard to visualize.

Motivated from an essay on Python Patterns – Implementing Graphs, consider a simple dictionary:

Given

A binary tree

               a
              / \
             b   c
            / \   \
           d   e   f

Code

Make a dictionary of unique nodes:

tree = {
   "a": ["b", "c"],
   "b": ["d", "e"],
   "c": [None, "f"],
   "d": [None, None],
   "e": [None, None],
   "f": [None, None],
}

Details

  • Each key-value pair is a unique node pointing to its children.
  • A list (or tuple) holds an ordered pair of left/right children.
  • With a dict having ordered insertion, assume the first entry is the root.
  • Common methods can be functions that mutate or traverse the dict (see find_all_paths()).

Tree-based functions often include the following common operations:

  • traverse: yield each node in a given order (usually left-to-right)
    • breadth-first search (BFS): traverse levels
    • depth-first search (DFS): traverse branches first (pre-/in-/post-order)
  • insert: add a node to the tree depending on the number of children
  • remove: remove a node depending on the number of children
  • update: merge missing nodes from one tree to the other
  • visit: yield the value of a traversed node

Try implementing all of these operations. Here we demonstrate one of these functions – a BFS traversal:

Example

import collections as ct


def traverse(tree):
    """Yield nodes from a tree via BFS."""
    q = ct.deque()                                         # 1
    root = next(iter(tree))                                # 2
    q.append(root)

    while q:
        node = q.popleft()
        children = filter(None, tree.get(node))
        for n in children:                                 # 3 
            q.append(n)
        yield node

list(traverse(tree))
# ['a', 'b', 'c', 'd', 'e', 'f']

This is a breadth-first search (level-order) algorithm applied to a dict of nodes and children.

  1. Initialize a FIFO queue. We use a deque, but a queue or a list works (the latter is inefficient).
  2. Get and enqueue the root node (assumes the root is the first entry in the dict, Python 3.6+).
  3. Iteratively dequeue a node, enqueue its children and yield the node value.

See also this in-depth tutorial on trees.


Insight

Something great about traversals in general, we can easily alter the latter iterative approach to depth-first search (DFS) by simply replacing the queue with a stack (a.k.a LIFO Queue). This simply means we dequeue from the same side that we enqueue. DFS allows us to search each branch.

How? Since we are using a deque, we can emulate a stack by changing node = q.popleft() to node = q.pop() (right). The result is a right-favored, pre-ordered DFS: ['a', 'c', 'f', 'b', 'e', 'd'].


回答 10

import random

class TreeNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.p = None

class BinaryTree:
    def __init__(self):
        self.root = None

    def length(self):
        return self.size

    def inorder(self, node):
        if node == None:
            return None
        else:
            self.inorder(node.left)
            print node.key,
            self.inorder(node.right)

    def search(self, k):
        node = self.root
        while node != None:
            if node.key == k:
                return node
            if node.key > k:
                node = node.left
            else:
                node = node.right
        return None

    def minimum(self, node):
        x = None
        while node.left != None:
            x = node.left
            node = node.left
        return x

    def maximum(self, node):
        x = None
        while node.right != None:
            x = node.right
            node = node.right
        return x

    def successor(self, node):
        parent = None
        if node.right != None:
            return self.minimum(node.right)
        parent = node.p
        while parent != None and node == parent.right:
            node = parent
            parent = parent.p
        return parent

    def predecessor(self, node):
        parent = None
        if node.left != None:
            return self.maximum(node.left)
        parent = node.p
        while parent != None and node == parent.left:
            node = parent
            parent = parent.p
        return parent

    def insert(self, k):
        t = TreeNode(k)
        parent = None
        node = self.root
        while node != None:
            parent = node
            if node.key > t.key:
                node = node.left
            else:
                node = node.right
        t.p = parent
        if parent == None:
            self.root = t
        elif t.key < parent.key:
            parent.left = t
        else:
            parent.right = t
        return t


    def delete(self, node):
        if node.left == None:
            self.transplant(node, node.right)
        elif node.right == None:
            self.transplant(node, node.left)
        else:
            succ = self.minimum(node.right)
            if succ.p != node:
                self.transplant(succ, succ.right)
                succ.right = node.right
                succ.right.p = succ
            self.transplant(node, succ)
            succ.left = node.left
            succ.left.p = succ

    def transplant(self, node, newnode):
        if node.p == None:
            self.root = newnode
        elif node == node.p.left:
            node.p.left = newnode
        else:
            node.p.right = newnode
        if newnode != None:
            newnode.p = node.p
import random

class TreeNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.p = None

class BinaryTree:
    def __init__(self):
        self.root = None

    def length(self):
        return self.size

    def inorder(self, node):
        if node == None:
            return None
        else:
            self.inorder(node.left)
            print node.key,
            self.inorder(node.right)

    def search(self, k):
        node = self.root
        while node != None:
            if node.key == k:
                return node
            if node.key > k:
                node = node.left
            else:
                node = node.right
        return None

    def minimum(self, node):
        x = None
        while node.left != None:
            x = node.left
            node = node.left
        return x

    def maximum(self, node):
        x = None
        while node.right != None:
            x = node.right
            node = node.right
        return x

    def successor(self, node):
        parent = None
        if node.right != None:
            return self.minimum(node.right)
        parent = node.p
        while parent != None and node == parent.right:
            node = parent
            parent = parent.p
        return parent

    def predecessor(self, node):
        parent = None
        if node.left != None:
            return self.maximum(node.left)
        parent = node.p
        while parent != None and node == parent.left:
            node = parent
            parent = parent.p
        return parent

    def insert(self, k):
        t = TreeNode(k)
        parent = None
        node = self.root
        while node != None:
            parent = node
            if node.key > t.key:
                node = node.left
            else:
                node = node.right
        t.p = parent
        if parent == None:
            self.root = t
        elif t.key < parent.key:
            parent.left = t
        else:
            parent.right = t
        return t


    def delete(self, node):
        if node.left == None:
            self.transplant(node, node.right)
        elif node.right == None:
            self.transplant(node, node.left)
        else:
            succ = self.minimum(node.right)
            if succ.p != node:
                self.transplant(succ, succ.right)
                succ.right = node.right
                succ.right.p = succ
            self.transplant(node, succ)
            succ.left = node.left
            succ.left.p = succ

    def transplant(self, node, newnode):
        if node.p == None:
            self.root = newnode
        elif node == node.p.left:
            node.p.left = newnode
        else:
            node.p.right = newnode
        if newnode != None:
            newnode.p = node.p

回答 11

此实现支持插入,查找和删除操作,而不会破坏树的结构。这不是平衡树。

# Class for construct the nodes of the tree. (Subtrees)
class Node:
def __init__(self, key, parent_node = None):
    self.left = None
    self.right = None
    self.key = key
    if parent_node == None:
        self.parent = self
    else:
        self.parent = parent_node

# Class with the  structure of the tree. 
# This Tree is not balanced.
class Tree:
def __init__(self):
    self.root = None

# Insert a single element
def insert(self, x):
    if(self.root == None):
        self.root = Node(x)
    else:
        self._insert(x, self.root)

def _insert(self, x, node):
    if(x < node.key):
        if(node.left == None):
            node.left = Node(x, node)
        else:
            self._insert(x, node.left)
    else:
        if(node.right == None):
            node.right = Node(x, node)
        else:
            self._insert(x, node.right)

# Given a element, return a node in the tree with key x. 
def find(self, x):
    if(self.root == None):
        return None
    else:
        return self._find(x, self.root)
def _find(self, x, node):
    if(x == node.key):
        return node
    elif(x < node.key):
        if(node.left == None):
            return None
        else:
            return self._find(x, node.left)
    elif(x > node.key):
        if(node.right == None):
            return None
        else:
            return self._find(x, node.right)

# Given a node, return the node in the tree with the next largest element.
def next(self, node):
    if node.right != None:
        return self._left_descendant(node.right)
    else:
        return self._right_ancestor(node)

def _left_descendant(self, node):
    if node.left == None:
        return node
    else:
        return self._left_descendant(node.left)

def _right_ancestor(self, node):
    if node.key <= node.parent.key:
        return node.parent
    else:
        return self._right_ancestor(node.parent)

# Delete an element of the tree
def delete(self, x):
    node = self.find(x)
    if node == None:
        print(x, "isn't in the tree")
    else:
        if node.right == None:
            if node.left == None:
                if node.key < node.parent.key:
                    node.parent.left = None
                    del node # Clean garbage
                else:
                    node.parent.right = None
                    del Node # Clean garbage
            else:
                node.key = node.left.key
                node.left = None
        else:
            x = self.next(node)
            node.key = x.key
            x = None


# tests
t = Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)

t.delete(8)
t.delete(5)

t.insert(9)
t.insert(1)

t.delete(2)
t.delete(100)

# Remember: Find method return the node object. 
# To return a number use t.find(nº).key
# But it will cause an error if the number is not in the tree.
print(t.find(5)) 
print(t.find(8))
print(t.find(4))
print(t.find(6))
print(t.find(9))

This implementation supports insert, find and delete operations without destroy the structure of the tree. This is not a banlanced tree.

# Class for construct the nodes of the tree. (Subtrees)
class Node:
def __init__(self, key, parent_node = None):
    self.left = None
    self.right = None
    self.key = key
    if parent_node == None:
        self.parent = self
    else:
        self.parent = parent_node

# Class with the  structure of the tree. 
# This Tree is not balanced.
class Tree:
def __init__(self):
    self.root = None

# Insert a single element
def insert(self, x):
    if(self.root == None):
        self.root = Node(x)
    else:
        self._insert(x, self.root)

def _insert(self, x, node):
    if(x < node.key):
        if(node.left == None):
            node.left = Node(x, node)
        else:
            self._insert(x, node.left)
    else:
        if(node.right == None):
            node.right = Node(x, node)
        else:
            self._insert(x, node.right)

# Given a element, return a node in the tree with key x. 
def find(self, x):
    if(self.root == None):
        return None
    else:
        return self._find(x, self.root)
def _find(self, x, node):
    if(x == node.key):
        return node
    elif(x < node.key):
        if(node.left == None):
            return None
        else:
            return self._find(x, node.left)
    elif(x > node.key):
        if(node.right == None):
            return None
        else:
            return self._find(x, node.right)

# Given a node, return the node in the tree with the next largest element.
def next(self, node):
    if node.right != None:
        return self._left_descendant(node.right)
    else:
        return self._right_ancestor(node)

def _left_descendant(self, node):
    if node.left == None:
        return node
    else:
        return self._left_descendant(node.left)

def _right_ancestor(self, node):
    if node.key <= node.parent.key:
        return node.parent
    else:
        return self._right_ancestor(node.parent)

# Delete an element of the tree
def delete(self, x):
    node = self.find(x)
    if node == None:
        print(x, "isn't in the tree")
    else:
        if node.right == None:
            if node.left == None:
                if node.key < node.parent.key:
                    node.parent.left = None
                    del node # Clean garbage
                else:
                    node.parent.right = None
                    del Node # Clean garbage
            else:
                node.key = node.left.key
                node.left = None
        else:
            x = self.next(node)
            node.key = x.key
            x = None


# tests
t = Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)

t.delete(8)
t.delete(5)

t.insert(9)
t.insert(1)

t.delete(2)
t.delete(100)

# Remember: Find method return the node object. 
# To return a number use t.find(nº).key
# But it will cause an error if the number is not in the tree.
print(t.find(5)) 
print(t.find(8))
print(t.find(4))
print(t.find(6))
print(t.find(9))

回答 12

我知道已经发布了许多好的解决方案,但是对于二叉树,我通常采用不同的方法:使用某些Node类并直接实现它更具可读性,但是当您有很多节点时,对于内存可能会变得非常贪婪,所以我建议增加一层复杂性并将节点存储在python列表中,然后仅使用该列表来模拟树的行为。

您仍然可以定义Node类,以在需要时最终表示树中的节点,但是将它们以简单的形式[value,left,right]保留在列表中将使用一半的内存或更少的内存!

这是二进制搜索树类的快速示例,该类将节点存储在数组中。它提供了基本功能,例如添加,删除,查找…

"""
Basic Binary Search Tree class without recursion...
"""

__author__ = "@fbparis"

class Node(object):
    __slots__ = "value", "parent", "left", "right"
    def __init__(self, value, parent=None, left=None, right=None):
        self.value = value
        self.parent = parent
        self.left = left
        self.right = right

    def __repr__(self):
        return "<%s object at %s: parent=%s, left=%s, right=%s, value=%s>" % (self.__class__.__name__, hex(id(self)), self.parent, self.left, self.right, self.value)

class BinarySearchTree(object):
    __slots__ = "_tree"
    def __init__(self, *args):
        self._tree = []
        if args:
            for x in args[0]:
                self.add(x)

    def __len__(self):
        return len(self._tree)

    def __repr__(self):
        return "<%s object at %s with %d nodes>" % (self.__class__.__name__, hex(id(self)), len(self))

    def __str__(self, nodes=None, level=0):
        ret = ""
        if nodes is None:
            if len(self):
                nodes = [0]
            else:
                nodes = []
        for node in nodes:
            if node is None:
                continue
            ret += "-" * level + " %s\n" % self._tree[node][0]
            ret += self.__str__(self._tree[node][2:4], level + 1)
        if level == 0:
            ret = ret.strip()
        return ret

    def __contains__(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return False
            return True
        return False

    def __eq__(self, other):
        return self._tree == other._tree

    def add(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    b = self._tree[node_index][2]
                    k = 2
                else:
                    b = self._tree[node_index][3]
                    k = 3
                if b is None:
                    self._tree[node_index][k] = len(self)
                    self._tree.append([value, node_index, None, None])
                    break
                node_index = b
        else:
            self._tree.append([value, None, None, None])

    def remove(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    raise KeyError
            if self._tree[node_index][2] is not None:
                b, d = 2, 3
            elif self._tree[node_index][3] is not None:
                b, d = 3, 2
            else:
                i = node_index
                b = None
            if b is not None:
                i = self._tree[node_index][b]
                while self._tree[i][d] is not None:
                    i = self._tree[i][d]
                p = self._tree[i][1]
                b = self._tree[i][b]
                if p == node_index:
                    self._tree[p][5-d] = b
                else:
                    self._tree[p][d] = b
                if b is not None:
                    self._tree[b][1] = p
                self._tree[node_index][0] = self._tree[i][0]
            else:
                p = self._tree[i][1]
                if p is not None:
                    if self._tree[p][2] == i:
                        self._tree[p][2] = None
                    else:
                        self._tree[p][3] = None
            last = self._tree.pop()
            n = len(self)
            if i < n:
                self._tree[i] = last[:]
                if last[2] is not None:
                    self._tree[last[2]][1] = i
                if last[3] is not None:
                    self._tree[last[3]][1] = i
                if self._tree[last[1]][2] == n:
                    self._tree[last[1]][2] = i
                else:
                    self._tree[last[1]][3] = i
        else:
            raise KeyError

    def find(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return None
            return Node(*self._tree[node_index])
        return None

我添加了一个父属性,以便您可以删除任何节点并维护BST结构。

抱歉,为了便于阅读,尤其是对于“删除”功能。基本上,当一个节点被删除时,我们弹出树数组并用最后一个元素替换它(除非我们想删除最后一个节点)。为了维持BST结构,将删除的节点替换为其左侧子节点的最大值或右侧子节点的最小值,并且必须执行一些操作才能使索引有效,但它必须足够快。

我将这种技术用于更高级的东西,用内部基数trie构建了一些大单词字典,并且我能够将内存消耗除以7-8(您可以在此处看到示例:https : //gist.github.com/fbparis / b3ddd5673b603b42c880974b23db7cda

I know many good solutions have already been posted but I usually have a different approach for binary trees: going with some Node class and implementing it directly is more readable but when you have a lot of nodes it can become very greedy regarding memory, so I suggest adding one layer of complexity and storing the nodes in a python list, and then simulating a tree behavior using only the list.

You can still define a Node class to finally represent the nodes in the tree when needed, but keeping them in a simple form [value, left, right] in a list will use half the memory or less!

Here is a quick example of a Binary Search Tree class storing the nodes in an array. It provides basic fonctions such as add, remove, find…

"""
Basic Binary Search Tree class without recursion...
"""

__author__ = "@fbparis"

class Node(object):
    __slots__ = "value", "parent", "left", "right"
    def __init__(self, value, parent=None, left=None, right=None):
        self.value = value
        self.parent = parent
        self.left = left
        self.right = right

    def __repr__(self):
        return "<%s object at %s: parent=%s, left=%s, right=%s, value=%s>" % (self.__class__.__name__, hex(id(self)), self.parent, self.left, self.right, self.value)

class BinarySearchTree(object):
    __slots__ = "_tree"
    def __init__(self, *args):
        self._tree = []
        if args:
            for x in args[0]:
                self.add(x)

    def __len__(self):
        return len(self._tree)

    def __repr__(self):
        return "<%s object at %s with %d nodes>" % (self.__class__.__name__, hex(id(self)), len(self))

    def __str__(self, nodes=None, level=0):
        ret = ""
        if nodes is None:
            if len(self):
                nodes = [0]
            else:
                nodes = []
        for node in nodes:
            if node is None:
                continue
            ret += "-" * level + " %s\n" % self._tree[node][0]
            ret += self.__str__(self._tree[node][2:4], level + 1)
        if level == 0:
            ret = ret.strip()
        return ret

    def __contains__(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return False
            return True
        return False

    def __eq__(self, other):
        return self._tree == other._tree

    def add(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    b = self._tree[node_index][2]
                    k = 2
                else:
                    b = self._tree[node_index][3]
                    k = 3
                if b is None:
                    self._tree[node_index][k] = len(self)
                    self._tree.append([value, node_index, None, None])
                    break
                node_index = b
        else:
            self._tree.append([value, None, None, None])

    def remove(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    raise KeyError
            if self._tree[node_index][2] is not None:
                b, d = 2, 3
            elif self._tree[node_index][3] is not None:
                b, d = 3, 2
            else:
                i = node_index
                b = None
            if b is not None:
                i = self._tree[node_index][b]
                while self._tree[i][d] is not None:
                    i = self._tree[i][d]
                p = self._tree[i][1]
                b = self._tree[i][b]
                if p == node_index:
                    self._tree[p][5-d] = b
                else:
                    self._tree[p][d] = b
                if b is not None:
                    self._tree[b][1] = p
                self._tree[node_index][0] = self._tree[i][0]
            else:
                p = self._tree[i][1]
                if p is not None:
                    if self._tree[p][2] == i:
                        self._tree[p][2] = None
                    else:
                        self._tree[p][3] = None
            last = self._tree.pop()
            n = len(self)
            if i < n:
                self._tree[i] = last[:]
                if last[2] is not None:
                    self._tree[last[2]][1] = i
                if last[3] is not None:
                    self._tree[last[3]][1] = i
                if self._tree[last[1]][2] == n:
                    self._tree[last[1]][2] = i
                else:
                    self._tree[last[1]][3] = i
        else:
            raise KeyError

    def find(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return None
            return Node(*self._tree[node_index])
        return None

I’ve added a parent attribute so that you can remove any node and maintain the BST structure.

Sorry for the readability, especially for the “remove” function. Basically, when a node is removed, we pop the tree array and replace it with the last element (except if we wanted to remove the last node). To maintain the BST structure, the removed node is replaced with the max of its left children or the min of its right children and some operations have to be done in order to keep the indexes valid but it’s fast enough.

I used this technique for more advanced stuff to build some big words dictionaries with an internal radix trie and I was able to divide memory consumption by 7-8 (you can see an example here: https://gist.github.com/fbparis/b3ddd5673b603b42c880974b23db7cda)


回答 13

二进制搜索树的良好实现,取自此处

'''
A binary search Tree
'''
from __future__ import print_function
class Node:

    def __init__(self, label, parent):
        self.label = label
        self.left = None
        self.right = None
        #Added in order to delete a node easier
        self.parent = parent

    def getLabel(self):
        return self.label

    def setLabel(self, label):
        self.label = label

    def getLeft(self):
        return self.left

    def setLeft(self, left):
        self.left = left

    def getRight(self):
        return self.right

    def setRight(self, right):
        self.right = right

    def getParent(self):
        return self.parent

    def setParent(self, parent):
        self.parent = parent

class BinarySearchTree:

    def __init__(self):
        self.root = None

    def insert(self, label):
        # Create a new Node
        new_node = Node(label, None)
        # If Tree is empty
        if self.empty():
            self.root = new_node
        else:
            #If Tree is not empty
            curr_node = self.root
            #While we don't get to a leaf
            while curr_node is not None:
                #We keep reference of the parent node
                parent_node = curr_node
                #If node label is less than current node
                if new_node.getLabel() < curr_node.getLabel():
                #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
            #We insert the new node in a leaf
            if new_node.getLabel() < parent_node.getLabel():
                parent_node.setLeft(new_node)
            else:
                parent_node.setRight(new_node)
            #Set parent to the new node
            new_node.setParent(parent_node)      

    def delete(self, label):
        if (not self.empty()):
            #Look for the node with that label
            node = self.getNode(label)
            #If the node exists
            if(node is not None):
                #If it has no children
                if(node.getLeft() is None and node.getRight() is None):
                    self.__reassignNodes(node, None)
                    node = None
                #Has only right children
                elif(node.getLeft() is None and node.getRight() is not None):
                    self.__reassignNodes(node, node.getRight())
                #Has only left children
                elif(node.getLeft() is not None and node.getRight() is None):
                    self.__reassignNodes(node, node.getLeft())
                #Has two children
                else:
                    #Gets the max value of the left branch
                    tmpNode = self.getMax(node.getLeft())
                    #Deletes the tmpNode
                    self.delete(tmpNode.getLabel())
                    #Assigns the value to the node to delete and keesp tree structure
                    node.setLabel(tmpNode.getLabel())

    def getNode(self, label):
        curr_node = None
        #If the tree is not empty
        if(not self.empty()):
            #Get tree root
            curr_node = self.getRoot()
            #While we don't find the node we look for
            #I am using lazy evaluation here to avoid NoneType Attribute error
            while curr_node is not None and curr_node.getLabel() is not label:
                #If node label is less than current node
                if label < curr_node.getLabel():
                    #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
        return curr_node

    def getMax(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the right branch
            curr_node = self.getRoot()
        if(not self.empty()):
            while(curr_node.getRight() is not None):
                curr_node = curr_node.getRight()
        return curr_node

    def getMin(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the left branch
            curr_node = self.getRoot()
        if(not self.empty()):
            curr_node = self.getRoot()
            while(curr_node.getLeft() is not None):
                curr_node = curr_node.getLeft()
        return curr_node

    def empty(self):
        if self.root is None:
            return True
        return False

    def __InOrderTraversal(self, curr_node):
        nodeList = []
        if curr_node is not None:
            nodeList.insert(0, curr_node)
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())
        return nodeList

    def getRoot(self):
        return self.root

    def __isRightChildren(self, node):
        if(node == node.getParent().getRight()):
            return True
        return False

    def __reassignNodes(self, node, newChildren):
        if(newChildren is not None):
            newChildren.setParent(node.getParent())
        if(node.getParent() is not None):
            #If it is the Right Children
            if(self.__isRightChildren(node)):
                node.getParent().setRight(newChildren)
            else:
                #Else it is the left children
                node.getParent().setLeft(newChildren)

    #This function traversal the tree. By default it returns an
    #In order traversal list. You can pass a function to traversal
    #The tree as needed by client code
    def traversalTree(self, traversalFunction = None, root = None):
        if(traversalFunction is None):
            #Returns a list of nodes in preOrder by default
            return self.__InOrderTraversal(self.root)
        else:
            #Returns a list of nodes in the order that the users wants to
            return traversalFunction(self.root)

    #Returns an string of all the nodes labels in the list 
    #In Order Traversal
    def __str__(self):
        list = self.__InOrderTraversal(self.root)
        str = ""
        for x in list:
            str = str + " " + x.getLabel().__str__()
        return str

def InPreOrder(curr_node):
    nodeList = []
    if curr_node is not None:
        nodeList = nodeList + InPreOrder(curr_node.getLeft())
        nodeList.insert(0, curr_node.getLabel())
        nodeList = nodeList + InPreOrder(curr_node.getRight())
    return nodeList

def testBinarySearchTree():
    r'''
    Example
                  8
                 / \
                3   10
               / \    \
              1   6    14
                 / \   /
                4   7 13 
    '''

    r'''
    Example After Deletion
                  7
                 / \
                1   4

    '''
    t = BinarySearchTree()
    t.insert(8)
    t.insert(3)
    t.insert(6)
    t.insert(1)
    t.insert(10)
    t.insert(14)
    t.insert(13)
    t.insert(4)
    t.insert(7)

    #Prints all the elements of the list in order traversal
    print(t.__str__())

    if(t.getNode(6) is not None):
        print("The label 6 exists")
    else:
        print("The label 6 doesn't exist")

    if(t.getNode(-1) is not None):
        print("The label -1 exists")
    else:
        print("The label -1 doesn't exist")

    if(not t.empty()):
        print(("Max Value: ", t.getMax().getLabel()))
        print(("Min Value: ", t.getMin().getLabel()))

    t.delete(13)
    t.delete(10)
    t.delete(8)
    t.delete(3)
    t.delete(6)
    t.delete(14)

    #Gets all the elements of the tree In pre order
    #And it prints them
    list = t.traversalTree(InPreOrder, t.root)
    for x in list:
        print(x)

if __name__ == "__main__":
    testBinarySearchTree()

A good implementation of binary search tree, taken from here:

'''
A binary search Tree
'''
from __future__ import print_function
class Node:

    def __init__(self, label, parent):
        self.label = label
        self.left = None
        self.right = None
        #Added in order to delete a node easier
        self.parent = parent

    def getLabel(self):
        return self.label

    def setLabel(self, label):
        self.label = label

    def getLeft(self):
        return self.left

    def setLeft(self, left):
        self.left = left

    def getRight(self):
        return self.right

    def setRight(self, right):
        self.right = right

    def getParent(self):
        return self.parent

    def setParent(self, parent):
        self.parent = parent

class BinarySearchTree:

    def __init__(self):
        self.root = None

    def insert(self, label):
        # Create a new Node
        new_node = Node(label, None)
        # If Tree is empty
        if self.empty():
            self.root = new_node
        else:
            #If Tree is not empty
            curr_node = self.root
            #While we don't get to a leaf
            while curr_node is not None:
                #We keep reference of the parent node
                parent_node = curr_node
                #If node label is less than current node
                if new_node.getLabel() < curr_node.getLabel():
                #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
            #We insert the new node in a leaf
            if new_node.getLabel() < parent_node.getLabel():
                parent_node.setLeft(new_node)
            else:
                parent_node.setRight(new_node)
            #Set parent to the new node
            new_node.setParent(parent_node)      

    def delete(self, label):
        if (not self.empty()):
            #Look for the node with that label
            node = self.getNode(label)
            #If the node exists
            if(node is not None):
                #If it has no children
                if(node.getLeft() is None and node.getRight() is None):
                    self.__reassignNodes(node, None)
                    node = None
                #Has only right children
                elif(node.getLeft() is None and node.getRight() is not None):
                    self.__reassignNodes(node, node.getRight())
                #Has only left children
                elif(node.getLeft() is not None and node.getRight() is None):
                    self.__reassignNodes(node, node.getLeft())
                #Has two children
                else:
                    #Gets the max value of the left branch
                    tmpNode = self.getMax(node.getLeft())
                    #Deletes the tmpNode
                    self.delete(tmpNode.getLabel())
                    #Assigns the value to the node to delete and keesp tree structure
                    node.setLabel(tmpNode.getLabel())

    def getNode(self, label):
        curr_node = None
        #If the tree is not empty
        if(not self.empty()):
            #Get tree root
            curr_node = self.getRoot()
            #While we don't find the node we look for
            #I am using lazy evaluation here to avoid NoneType Attribute error
            while curr_node is not None and curr_node.getLabel() is not label:
                #If node label is less than current node
                if label < curr_node.getLabel():
                    #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
        return curr_node

    def getMax(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the right branch
            curr_node = self.getRoot()
        if(not self.empty()):
            while(curr_node.getRight() is not None):
                curr_node = curr_node.getRight()
        return curr_node

    def getMin(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the left branch
            curr_node = self.getRoot()
        if(not self.empty()):
            curr_node = self.getRoot()
            while(curr_node.getLeft() is not None):
                curr_node = curr_node.getLeft()
        return curr_node

    def empty(self):
        if self.root is None:
            return True
        return False

    def __InOrderTraversal(self, curr_node):
        nodeList = []
        if curr_node is not None:
            nodeList.insert(0, curr_node)
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())
        return nodeList

    def getRoot(self):
        return self.root

    def __isRightChildren(self, node):
        if(node == node.getParent().getRight()):
            return True
        return False

    def __reassignNodes(self, node, newChildren):
        if(newChildren is not None):
            newChildren.setParent(node.getParent())
        if(node.getParent() is not None):
            #If it is the Right Children
            if(self.__isRightChildren(node)):
                node.getParent().setRight(newChildren)
            else:
                #Else it is the left children
                node.getParent().setLeft(newChildren)

    #This function traversal the tree. By default it returns an
    #In order traversal list. You can pass a function to traversal
    #The tree as needed by client code
    def traversalTree(self, traversalFunction = None, root = None):
        if(traversalFunction is None):
            #Returns a list of nodes in preOrder by default
            return self.__InOrderTraversal(self.root)
        else:
            #Returns a list of nodes in the order that the users wants to
            return traversalFunction(self.root)

    #Returns an string of all the nodes labels in the list 
    #In Order Traversal
    def __str__(self):
        list = self.__InOrderTraversal(self.root)
        str = ""
        for x in list:
            str = str + " " + x.getLabel().__str__()
        return str

def InPreOrder(curr_node):
    nodeList = []
    if curr_node is not None:
        nodeList = nodeList + InPreOrder(curr_node.getLeft())
        nodeList.insert(0, curr_node.getLabel())
        nodeList = nodeList + InPreOrder(curr_node.getRight())
    return nodeList

def testBinarySearchTree():
    r'''
    Example
                  8
                 / \
                3   10
               / \    \
              1   6    14
                 / \   /
                4   7 13 
    '''

    r'''
    Example After Deletion
                  7
                 / \
                1   4

    '''
    t = BinarySearchTree()
    t.insert(8)
    t.insert(3)
    t.insert(6)
    t.insert(1)
    t.insert(10)
    t.insert(14)
    t.insert(13)
    t.insert(4)
    t.insert(7)

    #Prints all the elements of the list in order traversal
    print(t.__str__())

    if(t.getNode(6) is not None):
        print("The label 6 exists")
    else:
        print("The label 6 doesn't exist")

    if(t.getNode(-1) is not None):
        print("The label -1 exists")
    else:
        print("The label -1 doesn't exist")

    if(not t.empty()):
        print(("Max Value: ", t.getMax().getLabel()))
        print(("Min Value: ", t.getMin().getLabel()))

    t.delete(13)
    t.delete(10)
    t.delete(8)
    t.delete(3)
    t.delete(6)
    t.delete(14)

    #Gets all the elements of the tree In pre order
    #And it prints them
    list = t.traversalTree(InPreOrder, t.root)
    for x in list:
        print(x)

if __name__ == "__main__":
    testBinarySearchTree()

回答 14

我想展示@apadana方法的一种变体,当有大量节点时,它会更有用:

'''
Suppose we have the following tree
      10
    /    \
  11      9
 /  \     / \
7   12  15   8
'''
# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist = [10,11,7,9,15,8,12]; n = []
for i in range(len(nlist)): n.append(Node(nlist[i]))

# Step 2 - Set each node position
n[0].left  = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]

I want to show a variation of @apadana’s method, which is more useful when there is a considerable number of nodes:

'''
Suppose we have the following tree
      10
    /    \
  11      9
 /  \     / \
7   12  15   8
'''
# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist = [10,11,7,9,15,8,12]; n = []
for i in range(len(nlist)): n.append(Node(nlist[i]))

# Step 2 - Set each node position
n[0].left  = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]

回答 15

class Node:
    """
    single Node for tree
    """

    def __init__(self, data):
        self.data = data
        self.right = None
        self.left = None


class binaryTree:
    """
    binary tree implementation
    """

    def __init__(self):
        self.root = None

    def push(self, element, node=None):
        if node is None:
            node = self.root

        if self.root is None:
            self.root = Node(element)

        else:
            if element < node.data:
                if node.left is not None:
                    self.push(element, node.left)
                else:
                    node.left = Node(element)
            else:
                if node.right is not None:
                    self.push(element, node.right)
                else:
                    node.right = Node(element)

    def __str__(self):
        self.printInorder(self.root)
        return "\n"

    def printInorder(self, node):
        """
        print tree in inorder
        """
        if node is not None:
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)


def main():
    """
    Main code and logic comes here
    """
    tree = binaryTree()
    tree.push(5)
    tree.push(3)
    tree.push(1)
    tree.push(3)
    tree.push(0)
    tree.push(2)
    tree.push(9)
    tree.push(10)
    print(tree)


if __name__ == "__main__":
    main()
class Node:
    """
    single Node for tree
    """

    def __init__(self, data):
        self.data = data
        self.right = None
        self.left = None


class binaryTree:
    """
    binary tree implementation
    """

    def __init__(self):
        self.root = None

    def push(self, element, node=None):
        if node is None:
            node = self.root

        if self.root is None:
            self.root = Node(element)

        else:
            if element < node.data:
                if node.left is not None:
                    self.push(element, node.left)
                else:
                    node.left = Node(element)
            else:
                if node.right is not None:
                    self.push(element, node.right)
                else:
                    node.right = Node(element)

    def __str__(self):
        self.printInorder(self.root)
        return "\n"

    def printInorder(self, node):
        """
        print tree in inorder
        """
        if node is not None:
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)


def main():
    """
    Main code and logic comes here
    """
    tree = binaryTree()
    tree.push(5)
    tree.push(3)
    tree.push(1)
    tree.push(3)
    tree.push(0)
    tree.push(2)
    tree.push(9)
    tree.push(10)
    print(tree)


if __name__ == "__main__":
    main()

回答 16

Python中的二叉树

 class Tree(object):
    def __init__(self):
        self.data=None
        self.left=None
        self.right=None
    def insert(self, x, root):
        if root==None:
            t=node(x)
            t.data=x
            t.right=None
            t.left=None
            root=t
            return root
        elif x<root.data:
            root.left=self.insert(x, root.left)
        else:
            root.right=self.insert(x, root.right)
        return root

    def printTree(self, t):
        if t==None:
            return

        self.printTree(t.left)
        print t.data
        self.printTree(t.right)

class node(object):
    def __init__(self, x):
        self.x=x

bt=Tree()
root=None
n=int(raw_input())
a=[]
for i in range(n):
    a.append(int(raw_input()))
for i in range(n):
    root=bt.insert(a[i], root)
bt.printTree(root)

Binary Tree in Python

 class Tree(object):
    def __init__(self):
        self.data=None
        self.left=None
        self.right=None
    def insert(self, x, root):
        if root==None:
            t=node(x)
            t.data=x
            t.right=None
            t.left=None
            root=t
            return root
        elif x<root.data:
            root.left=self.insert(x, root.left)
        else:
            root.right=self.insert(x, root.right)
        return root

    def printTree(self, t):
        if t==None:
            return

        self.printTree(t.left)
        print t.data
        self.printTree(t.right)

class node(object):
    def __init__(self, x):
        self.x=x

bt=Tree()
root=None
n=int(raw_input())
a=[]
for i in range(n):
    a.append(int(raw_input()))
for i in range(n):
    root=bt.insert(a[i], root)
bt.printTree(root)

回答 17

这是一个简单的解决方案,可以使用递归方法来构建二叉树,以在下面的代码中使用遍历顺序来显示树。

class Node(object):

    def __init__(self):
        self.left = None
        self.right = None
        self.value = None
    @property
    def get_value(self):
        return self.value

    @property
    def get_left(self):
        return self.left

    @property
    def get_right(self):
        return self.right

    @get_left.setter
    def set_left(self, left_node):
        self.left = left_node
    @get_value.setter
    def set_value(self, value):
        self.value = value
    @get_right.setter
    def set_right(self, right_node):
        self.right = right_node



    def create_tree(self):
        _node = Node() #creating new node.
        _x = input("Enter the node data(-1 for null)")
        if(_x == str(-1)): #for defining no child.
            return None
        _node.set_value = _x #setting the value of the node.
        print("Enter the left child of {}".format(_x))
        _node.set_left = self.create_tree() #setting the left subtree
        print("Enter the right child of {}".format(_x))
        _node.set_right = self.create_tree() #setting the right subtree.

        return _node

    def pre_order(self, root):
        if root is not None:
            print(root.get_value)
            self.pre_order(root.get_left)
            self.pre_order(root.get_right)

if __name__ == '__main__':
    node = Node()
    root_node = node.create_tree()
    node.pre_order(root_node)

代码取自:Python中的二叉树

Here is a simple solution which can be used to build a binary tree using a recursive approach to display the tree in order traversal has been used in the below code.

class Node(object):

    def __init__(self):
        self.left = None
        self.right = None
        self.value = None
    @property
    def get_value(self):
        return self.value

    @property
    def get_left(self):
        return self.left

    @property
    def get_right(self):
        return self.right

    @get_left.setter
    def set_left(self, left_node):
        self.left = left_node
    @get_value.setter
    def set_value(self, value):
        self.value = value
    @get_right.setter
    def set_right(self, right_node):
        self.right = right_node



    def create_tree(self):
        _node = Node() #creating new node.
        _x = input("Enter the node data(-1 for null)")
        if(_x == str(-1)): #for defining no child.
            return None
        _node.set_value = _x #setting the value of the node.
        print("Enter the left child of {}".format(_x))
        _node.set_left = self.create_tree() #setting the left subtree
        print("Enter the right child of {}".format(_x))
        _node.set_right = self.create_tree() #setting the right subtree.

        return _node

    def pre_order(self, root):
        if root is not None:
            print(root.get_value)
            self.pre_order(root.get_left)
            self.pre_order(root.get_right)

if __name__ == '__main__':
    node = Node()
    root_node = node.create_tree()
    node.pre_order(root_node)

Code taken from : Binary Tree in Python