标签归档:algorithm

如何在广度优先搜索中跟踪路径?

问题:如何在广度优先搜索中跟踪路径?

如何跟踪广度优先搜索的路径,如以下示例所示:

如果搜索key 11,则返回连接1到11 的最短列表。

[1, 4, 7, 11]

How do you trace the path of a Breadth-First Search, such that in the following example:

If searching for key 11, return the shortest list connecting 1 to 11.

[1, 4, 7, 11]

回答 0

您应该先查看http://en.wikipedia.org/wiki/Breadth-first_search


下面是一个快速实现,其中我使用列表列表表示路径队列。

# graph is in adjacent list representation
graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, start, end):
    # maintain a queue of paths
    queue = []
    # push the first path into the queue
    queue.append([start])
    while queue:
        # get the first path from the queue
        path = queue.pop(0)
        # get the last node from the path
        node = path[-1]
        # path found
        if node == end:
            return path
        # enumerate all adjacent nodes, construct a new path and push it into the queue
        for adjacent in graph.get(node, []):
            new_path = list(path)
            new_path.append(adjacent)
            queue.append(new_path)

print bfs(graph, '1', '11')

另一种方法是维护从每个节点到其父节点的映射,并在检查相邻节点时记录其父节点。搜索完成后,只需根据父映射进行回溯即可。

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def backtrace(parent, start, end):
    path = [end]
    while path[-1] != start:
        path.append(parent[path[-1]])
    path.reverse()
    return path


def bfs(graph, start, end):
    parent = {}
    queue = []
    queue.append(start)
    while queue:
        node = queue.pop(0)
        if node == end:
            return backtrace(parent, start, end)
        for adjacent in graph.get(node, []):
            if node not in queue :
                parent[adjacent] = node # <<<<< record its parent 
                queue.append(adjacent)

print bfs(graph, '1', '11')

上面的代码基于没有循环的假设。

You should have look at http://en.wikipedia.org/wiki/Breadth-first_search first.


Below is a quick implementation, in which I used a list of list to represent the queue of paths.

# graph is in adjacent list representation
graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, start, end):
    # maintain a queue of paths
    queue = []
    # push the first path into the queue
    queue.append([start])
    while queue:
        # get the first path from the queue
        path = queue.pop(0)
        # get the last node from the path
        node = path[-1]
        # path found
        if node == end:
            return path
        # enumerate all adjacent nodes, construct a new path and push it into the queue
        for adjacent in graph.get(node, []):
            new_path = list(path)
            new_path.append(adjacent)
            queue.append(new_path)

print bfs(graph, '1', '11')

Another approach would be maintaining a mapping from each node to its parent, and when inspecting the adjacent node, record its parent. When the search is done, simply backtrace according the parent mapping.

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def backtrace(parent, start, end):
    path = [end]
    while path[-1] != start:
        path.append(parent[path[-1]])
    path.reverse()
    return path


def bfs(graph, start, end):
    parent = {}
    queue = []
    queue.append(start)
    while queue:
        node = queue.pop(0)
        if node == end:
            return backtrace(parent, start, end)
        for adjacent in graph.get(node, []):
            if node not in queue :
                parent[adjacent] = node # <<<<< record its parent 
                queue.append(adjacent)

print bfs(graph, '1', '11')

The above codes are based on the assumption that there’s no cycles.


回答 1

我非常喜欢qiao的第一个答案!这里唯一缺少的是将顶点标记为已访问。

为什么我们需要这样做?
让我们想象一下,从节点11连接了另一个节点号13。现在我们的目标是找到节点13。
经过一点点运行,队列将如下所示:

[[1, 2, 6], [1, 3, 10], [1, 4, 7], [1, 4, 8], [1, 2, 5, 9], [1, 2, 5, 10]]

请注意,最后有两个路径的节点编号为10。
这意味着从节点号10开始的路径将被检查两次。在这种情况下,它看起来并不那么糟糕,因为10号节点没有任何子节点。但是,这可能真的很糟糕(即使在这里我们也会无故检查两次该节点。)
13号节点不在其中这些路径,因此程序在到达最后一个节点号为10的第二条路径之前不会返回。我们将对其进行重新检查。

我们所缺少的只是一个标记访问的节点而不要再次检查它们的集合。
这是修改后的qiao的代码:

graph = {
    1: [2, 3, 4],
    2: [5, 6],
    3: [10],
    4: [7, 8],
    5: [9, 10],
    7: [11, 12],
    11: [13]
}


def bfs(graph_to_search, start, end):
    queue = [[start]]
    visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

该程序的输出将是:

[1, 4, 7, 11, 13]

没有轻松的检查。

I liked qiao’s first answer very much! The only thing missing here is to mark the vertexes as visited.

Why we need to do it?
Lets imagine that there is another node number 13 connected from node 11. Now our goal is to find node 13.
After a little bit of a run the queue will look like this:

[[1, 2, 6], [1, 3, 10], [1, 4, 7], [1, 4, 8], [1, 2, 5, 9], [1, 2, 5, 10]]

Note that there are TWO paths with node number 10 at the end.
Which means that the paths from node number 10 will be checked twice. In this case it doesn’t look so bad because node number 10 doesn’t have any children.. But it could be really bad (even here we will check that node twice for no reason..)
Node number 13 isn’t in those paths so the program won’t return before reaching to the second path with node number 10 at the end..And we will recheck it..

All we are missing is a set to mark the visited nodes and not to check them again..
This is qiao’s code after the modification:

graph = {
    1: [2, 3, 4],
    2: [5, 6],
    3: [10],
    4: [7, 8],
    5: [9, 10],
    7: [11, 12],
    11: [13]
}


def bfs(graph_to_search, start, end):
    queue = [[start]]
    visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

The output of the program will be:

[1, 4, 7, 11, 13]

Without the unneccecery rechecks..


回答 2

非常简单的代码。每次发现节点时,您都​​会追加路径。

graph = {
         'A': set(['B', 'C']),
         'B': set(['A', 'D', 'E']),
         'C': set(['A', 'F']),
         'D': set(['B']),
         'E': set(['B', 'F']),
         'F': set(['C', 'E'])
         }
def retunShortestPath(graph, start, end):

    queue = [(start,[start])]
    visited = set()

    while queue:
        vertex, path = queue.pop(0)
        visited.add(vertex)
        for node in graph[vertex]:
            if node == end:
                return path + [end]
            else:
                if node not in visited:
                    visited.add(node)
                    queue.append((node, path + [node]))

Very easy code. You keep appending the path each time you discover a node.

graph = {
         'A': set(['B', 'C']),
         'B': set(['A', 'D', 'E']),
         'C': set(['A', 'F']),
         'D': set(['B']),
         'E': set(['B', 'F']),
         'F': set(['C', 'E'])
         }
def retunShortestPath(graph, start, end):

    queue = [(start,[start])]
    visited = set()

    while queue:
        vertex, path = queue.pop(0)
        visited.add(vertex)
        for node in graph[vertex]:
            if node == end:
                return path + [end]
            else:
                if node not in visited:
                    visited.add(node)
                    queue.append((node, path + [node]))

回答 3

我以为我会尝试将此代码编写起来很有趣:

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, forefront, end):
    # assumes no cycles

    next_forefront = [(node, path + ',' + node) for i, path in forefront if i in graph for node in graph[i]]

    for node,path in next_forefront:
        if node==end:
            return path
    else:
        return bfs(graph,next_forefront,end)

print bfs(graph,[('1','1')],'11')

# >>>
# 1, 4, 7, 11

如果需要循环,可以添加以下内容:

for i, j in for_front: # allow cycles, add this code
    if i in graph:
        del graph[i]

I thought I’d try code this up for fun:

graph = {
        '1': ['2', '3', '4'],
        '2': ['5', '6'],
        '5': ['9', '10'],
        '4': ['7', '8'],
        '7': ['11', '12']
        }

def bfs(graph, forefront, end):
    # assumes no cycles

    next_forefront = [(node, path + ',' + node) for i, path in forefront if i in graph for node in graph[i]]

    for node,path in next_forefront:
        if node==end:
            return path
    else:
        return bfs(graph,next_forefront,end)

print bfs(graph,[('1','1')],'11')

# >>>
# 1, 4, 7, 11

If you want cycles you could add this:

for i, j in for_front: # allow cycles, add this code
    if i in graph:
        del graph[i]

回答 4

我既喜欢@乔的第一个答案,又喜欢@Or的加法。为了减少处理量,我想补充一下Or的答案。

在@Or的答案中,访问节点的跟踪很棒。我们还可以允许程序比当前状态早退出。在for循环的某个点上,current_neighbour必须是end,一旦发生,就会找到最短路径,程序可以返回。

我将修改方法如下,请密切注意for循环

graph = {
1: [2, 3, 4],
2: [5, 6],
3: [10],
4: [7, 8],
5: [9, 10],
7: [11, 12],
11: [13]
}


    def bfs(graph_to_search, start, end):
        queue = [[start]]
        visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

                #No need to visit other neighbour. Return at once
                if current_neighbour == end
                    return new_path;

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

输出和其他所有内容都将相同。但是,该代码将花费更少的时间来处理。这在较大的图上特别有用。我希望这对以后的人有所帮助。

I like both @Qiao first answer and @Or’s addition. For a sake of a little less processing I would like to add to Or’s answer.

In @Or’s answer keeping track of visited node is great. We can also allow the program to exit sooner that it currently is. At some point in the for loop the current_neighbour will have to be the end, and once that happens the shortest path is found and program can return.

I would modify the the method as follow, pay close attention to the for loop

graph = {
1: [2, 3, 4],
2: [5, 6],
3: [10],
4: [7, 8],
5: [9, 10],
7: [11, 12],
11: [13]
}


    def bfs(graph_to_search, start, end):
        queue = [[start]]
        visited = set()

    while queue:
        # Gets the first path in the queue
        path = queue.pop(0)

        # Gets the last node in the path
        vertex = path[-1]

        # Checks if we got to the end
        if vertex == end:
            return path
        # We check if the current node is already in the visited nodes set in order not to recheck it
        elif vertex not in visited:
            # enumerate all adjacent nodes, construct a new path and push it into the queue
            for current_neighbour in graph_to_search.get(vertex, []):
                new_path = list(path)
                new_path.append(current_neighbour)
                queue.append(new_path)

                #No need to visit other neighbour. Return at once
                if current_neighbour == end
                    return new_path;

            # Mark the vertex as visited
            visited.add(vertex)


print bfs(graph, 1, 13)

The output and everything else will be the same. However, the code will take less time to process. This is especially useful on larger graphs. I hope this helps someone in the future.


什么是好的速率限制算法?

问题:什么是好的速率限制算法?

我可以使用一些伪代码,或者更好的Python。我正在尝试为Python IRC机器人实现一个限速队列,并且部分起作用,但是如果有人触发的消息少于限制(例如,限速为每8秒5条消息,而该人仅触发4条消息),并且下一个触发时间超过8秒(例如16秒后),机器人将发送消息,但队列已满,机器人将等待8秒,即使由于8秒钟的时间已过去也不需要它。

I could use some pseudo-code, or better, Python. I am trying to implement a rate-limiting queue for a Python IRC bot, and it partially works, but if someone triggers less messages than the limit (e.g., rate limit is 5 messages per 8 seconds, and the person triggers only 4), and the next trigger is over the 8 seconds (e.g., 16 seconds later), the bot sends the message, but the queue becomes full and the bot waits 8 seconds, even though it’s not needed since the 8 second period has lapsed.


回答 0

这是最简单的算法,如果您只想在消息到达太快时就丢弃它们(而不是对其进行排队,这很有意义,因为队列可能会变得任意大):

rate = 5.0; // unit: messages
per  = 8.0; // unit: seconds
allowance = rate; // unit: messages
last_check = now(); // floating-point, e.g. usec accuracy. Unit: seconds

when (message_received):
  current = now();
  time_passed = current - last_check;
  last_check = current;
  allowance += time_passed * (rate / per);
  if (allowance > rate):
    allowance = rate; // throttle
  if (allowance < 1.0):
    discard_message();
  else:
    forward_message();
    allowance -= 1.0;

此解决方案中没有数据结构,计时器等,它可以正常工作:)看到这一点,“津贴”最多以每秒5/8个单位的速度增长,即每八秒最多五个单位。转发的每封邮件都会扣除一个单位,因此每八秒钟发送的邮件不能超过五个。

请注意,该值rate应为整数,即不含非零的小数部分,否则该算法将无法正常工作(实际费率将不是rate/per)。例如,rate=0.5; per=1.0;它无法正常工作,因为allowance它将永远不会增长到1.0。但是rate=1.0; per=2.0;效果很好。

Here the simplest algorithm, if you want just to drop messages when they arrive too quickly (instead of queuing them, which makes sense because the queue might get arbitrarily large):

rate = 5.0; // unit: messages
per  = 8.0; // unit: seconds
allowance = rate; // unit: messages
last_check = now(); // floating-point, e.g. usec accuracy. Unit: seconds

when (message_received):
  current = now();
  time_passed = current - last_check;
  last_check = current;
  allowance += time_passed * (rate / per);
  if (allowance > rate):
    allowance = rate; // throttle
  if (allowance < 1.0):
    discard_message();
  else:
    forward_message();
    allowance -= 1.0;

There are no datastructures, timers etc. in this solution and it works cleanly :) To see this, ‘allowance’ grows at speed 5/8 units per seconds at most, i.e. at most five units per eight seconds. Every message that is forwarded deducts one unit, so you can’t send more than five messages per every eight seconds.

Note that rate should be an integer, i.e. without non-zero decimal part, or the algorithm won’t work correctly (actual rate will not be rate/per). E.g. rate=0.5; per=1.0; does not work because allowance will never grow to 1.0. But rate=1.0; per=2.0; works fine.


回答 1

在函数加入之前,使用此装饰器@RateLimited(ratepersec)。

基本上,这检查自上次以来是否过去了1 / rate秒,如果没有,则等待其余时间,否则不等待。这有效地限制了您的速率/秒。装饰器可以应用于您要限制速率的任何功能。

对于您的情况,如果每8秒最多需要5条消息,请在sendToQueue函数之前使用@RateLimited(0.625)。

import time

def RateLimited(maxPerSecond):
    minInterval = 1.0 / float(maxPerSecond)
    def decorate(func):
        lastTimeCalled = [0.0]
        def rateLimitedFunction(*args,**kargs):
            elapsed = time.clock() - lastTimeCalled[0]
            leftToWait = minInterval - elapsed
            if leftToWait>0:
                time.sleep(leftToWait)
            ret = func(*args,**kargs)
            lastTimeCalled[0] = time.clock()
            return ret
        return rateLimitedFunction
    return decorate

@RateLimited(2)  # 2 per second at most
def PrintNumber(num):
    print num

if __name__ == "__main__":
    print "This should print 1,2,3... at about 2 per second."
    for i in range(1,100):
        PrintNumber(i)

Use this decorator @RateLimited(ratepersec) before your function that enqueues.

Basically, this checks if 1/rate secs have passed since the last time and if not, waits the remainder of the time, otherwise it doesn’t wait. This effectively limits you to rate/sec. The decorator can be applied to any function you want rate-limited.

In your case, if you want a maximum of 5 messages per 8 seconds, use @RateLimited(0.625) before your sendToQueue function.

import time

def RateLimited(maxPerSecond):
    minInterval = 1.0 / float(maxPerSecond)
    def decorate(func):
        lastTimeCalled = [0.0]
        def rateLimitedFunction(*args,**kargs):
            elapsed = time.clock() - lastTimeCalled[0]
            leftToWait = minInterval - elapsed
            if leftToWait>0:
                time.sleep(leftToWait)
            ret = func(*args,**kargs)
            lastTimeCalled[0] = time.clock()
            return ret
        return rateLimitedFunction
    return decorate

@RateLimited(2)  # 2 per second at most
def PrintNumber(num):
    print num

if __name__ == "__main__":
    print "This should print 1,2,3... at about 2 per second."
    for i in range(1,100):
        PrintNumber(i)

回答 2

令牌桶很容易实现。

从带有5个令牌的存储桶开始。

每5/8秒:如果存储桶中的令牌少于5个,则添加一个。

每次您要发送消息时:如果存储桶中的令牌≥1,则取出一个令牌并发送消息。否则,请等待/丢弃消息/任何内容。

(显然,在实际代码中,您将使用整数计数器代替实际的令牌,并且可以通过存储时间戳来优化每5/8秒的步长)


再次阅读问题,如果速率限制每8秒被完全重置一次,则可以进行以下修改:

last_send很久以前的某个时间(例如,纪元)开始,以一个时间戳记开始。同样,从相同的5令牌桶开始。

每5/8秒执行一次规则。

每次发送消息时:首先,检查是否last_send≥8秒。如果是这样,请填充存储桶(将其设置为5个令牌)。其次,如果存储桶中有令牌,则发送消息(否则,丢弃/等待/等)。第三,设置last_send为现在。

那应该适合那种情况。


我实际上已经使用这种策略(第一种方法)编写了IRC机器人。它在Perl中而不是Python中,但是这里有一些代码来说明:

第一部分处理将令牌添加到存储桶的过程。您可以看到基于时间(从第二行到最后一行)添加令牌的优化,然后最后一行将存储桶内容限制为最大值(MESSAGE_BURST)

    my $start_time = time;
    ...
    # Bucket handling
    my $bucket = $conn->{fujiko_limit_bucket};
    my $lasttx = $conn->{fujiko_limit_lasttx};
    $bucket += ($start_time-$lasttx)/MESSAGE_INTERVAL;
    ($bucket <= MESSAGE_BURST) or $bucket = MESSAGE_BURST;

$ conn是一个传递的数据结构。这是在常规运行的方法中进行的(它计算下一次要执行的操作,并休眠很长时间或直到获得网络流量为止)。该方法的下一部分处理发送。这非常复杂,因为消息具有与之关联的优先级。

    # Queue handling. Start with the ultimate queue.
    my $queues = $conn->{fujiko_queues};
    foreach my $entry (@{$queues->[PRIORITY_ULTIMATE]}) {
            # Ultimate is special. We run ultimate no matter what. Even if
            # it sends the bucket negative.
            --$bucket;
            $entry->{code}(@{$entry->{args}});
    }
    $queues->[PRIORITY_ULTIMATE] = [];

那是第一个队列,无论如何运行。即使它使我们的连接因洪灾而被杀死。用于极其重要的事情,例如响应服务器的PING。接下来,其余队列:

    # Continue to the other queues, in order of priority.
    QRUN: for (my $pri = PRIORITY_HIGH; $pri >= PRIORITY_JUNK; --$pri) {
            my $queue = $queues->[$pri];
            while (scalar(@$queue)) {
                    if ($bucket < 1) {
                            # continue later.
                            $need_more_time = 1;
                            last QRUN;
                    } else {
                            --$bucket;
                            my $entry = shift @$queue;
                            $entry->{code}(@{$entry->{args}});
                    }
            }
    }

最后,将存储区状态保存回$ conn数据结构(实际上是该方法的稍后部分;它首先计算将有多长时间进行更多工作)

    # Save status.
    $conn->{fujiko_limit_bucket} = $bucket;
    $conn->{fujiko_limit_lasttx} = $start_time;

如您所见,实际的存储桶处理代码非常小-大约四行。其余代码是优先级队列处理。该机器人具有优先级队列,因此,例如,与它聊天的人无法阻止其执行重要的踢/禁止任务。

A Token Bucket is fairly simple to implement.

Start with a bucket with 5 tokens.

Every 5/8 seconds: If the bucket has less than 5 tokens, add one.

Each time you want to send a message: If the bucket has ≥1 token, take one token out and send the message. Otherwise, wait/drop the message/whatever.

(obviously, in actual code, you’d use an integer counter instead of real tokens and you can optimize out the every 5/8s step by storing timestamps)


Reading the question again, if the rate limit is fully reset each 8 seconds, then here is a modification:

Start with a timestamp, last_send, at a time long ago (e.g., at the epoch). Also, start with the same 5-token bucket.

Strike the every 5/8 seconds rule.

Each time you send a message: First, check if last_send ≥ 8 seconds ago. If so, fill the bucket (set it to 5 tokens). Second, if there are tokens in the bucket, send the message (otherwise, drop/wait/etc.). Third, set last_send to now.

That should work for that scenario.


I’ve actually written an IRC bot using a strategy like this (the first approach). Its in Perl, not Python, but here is some code to illustrate:

The first part here handles adding tokens to the bucket. You can see the optimization of adding tokens based on time (2nd to last line) and then the last line clamps bucket contents to the maximum (MESSAGE_BURST)

    my $start_time = time;
    ...
    # Bucket handling
    my $bucket = $conn->{fujiko_limit_bucket};
    my $lasttx = $conn->{fujiko_limit_lasttx};
    $bucket += ($start_time-$lasttx)/MESSAGE_INTERVAL;
    ($bucket <= MESSAGE_BURST) or $bucket = MESSAGE_BURST;

$conn is a data structure which is passed around. This is inside a method that runs routinely (it calculates when the next time it’ll have something to do, and sleeps either that long or until it gets network traffic). The next part of the method handles sending. It is rather complicated, because messages have priorities associated with them.

    # Queue handling. Start with the ultimate queue.
    my $queues = $conn->{fujiko_queues};
    foreach my $entry (@{$queues->[PRIORITY_ULTIMATE]}) {
            # Ultimate is special. We run ultimate no matter what. Even if
            # it sends the bucket negative.
            --$bucket;
            $entry->{code}(@{$entry->{args}});
    }
    $queues->[PRIORITY_ULTIMATE] = [];

That’s the first queue, which is run no matter what. Even if it gets our connection killed for flooding. Used for extremely important things, like responding to the server’s PING. Next, the rest of the queues:

    # Continue to the other queues, in order of priority.
    QRUN: for (my $pri = PRIORITY_HIGH; $pri >= PRIORITY_JUNK; --$pri) {
            my $queue = $queues->[$pri];
            while (scalar(@$queue)) {
                    if ($bucket < 1) {
                            # continue later.
                            $need_more_time = 1;
                            last QRUN;
                    } else {
                            --$bucket;
                            my $entry = shift @$queue;
                            $entry->{code}(@{$entry->{args}});
                    }
            }
    }

Finally, the bucket status is saved back to the $conn data structure (actually a bit later in the method; it first calculates how soon it’ll have more work)

    # Save status.
    $conn->{fujiko_limit_bucket} = $bucket;
    $conn->{fujiko_limit_lasttx} = $start_time;

As you can see, the actual bucket handling code is very small — about four lines. The rest of the code is priority queue handling. The bot has priority queues so that e.g., someone chatting with it can’t prevent it from doing its important kick/ban duties.


回答 3

为了阻止处理,直到消息可以发送为止,从而使更多消息排队,antti的漂亮解决方案也可以这样修改:

rate = 5.0; // unit: messages
per  = 8.0; // unit: seconds
allowance = rate; // unit: messages
last_check = now(); // floating-point, e.g. usec accuracy. Unit: seconds

when (message_received):
  current = now();
  time_passed = current - last_check;
  last_check = current;
  allowance += time_passed * (rate / per);
  if (allowance > rate):
    allowance = rate; // throttle
  if (allowance < 1.0):
    time.sleep( (1-allowance) * (per/rate))
    forward_message();
    allowance = 0.0;
  else:
    forward_message();
    allowance -= 1.0;

它只是等待直到有足够的余量来发送消息。为了不以两倍的比率开始,津贴也可以用0初始化。

to block processing until the message can be sent, thus queuing up further messages, antti’s beautiful solution may also be modified like this:

rate = 5.0; // unit: messages
per  = 8.0; // unit: seconds
allowance = rate; // unit: messages
last_check = now(); // floating-point, e.g. usec accuracy. Unit: seconds

when (message_received):
  current = now();
  time_passed = current - last_check;
  last_check = current;
  allowance += time_passed * (rate / per);
  if (allowance > rate):
    allowance = rate; // throttle
  if (allowance < 1.0):
    time.sleep( (1-allowance) * (per/rate))
    forward_message();
    allowance = 0.0;
  else:
    forward_message();
    allowance -= 1.0;

it just waits until enough allowance is there to send the message. to not start with two times the rate, allowance may also initialized with 0.


回答 4

保留最后五行的发送时间。保留排队的消息,直到最近的第五条消息(如果存在)过去至少8秒(以last_five作为时间数组)为止:

now = time.time()
if len(last_five) == 0 or (now - last_five[-1]) >= 8.0:
    last_five.insert(0, now)
    send_message(msg)
if len(last_five) > 5:
    last_five.pop()

Keep the time that the last five lines were sent. Hold the queued messages until the time the fifth-most-recent message (if it exists) is a least 8 seconds in the past (with last_five as an array of times):

now = time.time()
if len(last_five) == 0 or (now - last_five[-1]) >= 8.0:
    last_five.insert(0, now)
    send_message(msg)
if len(last_five) > 5:
    last_five.pop()

回答 5

一种解决方案是将时间戳记附加到每个队列项目,并在经过8秒后丢弃该项目。您可以在每次添加队列时执行此检查。

仅当将队列大小限制为5并在队列已满时丢弃所有添加项时,此方法才有效。

One solution is to attach a timestamp to each queue item and to discard the item after 8 seconds have passed. You can perform this check each time the queue is added to.

This only works if you limit the queue size to 5 and discard any additions whilst the queue is full.


回答 6

如果仍然有兴趣,我可以将此简单的可调用类与定时LRU键值存储结合使用,以限制每个IP的请求速率。使用双端队列,但可以重写为与列表一起使用。

from collections import deque
import time


class RateLimiter:
    def __init__(self, maxRate=5, timeUnit=1):
        self.timeUnit = timeUnit
        self.deque = deque(maxlen=maxRate)

    def __call__(self):
        if self.deque.maxlen == len(self.deque):
            cTime = time.time()
            if cTime - self.deque[0] > self.timeUnit:
                self.deque.append(cTime)
                return False
            else:
                return True
        self.deque.append(time.time())
        return False

r = RateLimiter()
for i in range(0,100):
    time.sleep(0.1)
    print(i, "block" if r() else "pass")

If someone still interested, I use this simple callable class in conjunction with a timed LRU key value storage to limit request rate per IP. Uses a deque, but can rewritten to be used with a list instead.

from collections import deque
import time


class RateLimiter:
    def __init__(self, maxRate=5, timeUnit=1):
        self.timeUnit = timeUnit
        self.deque = deque(maxlen=maxRate)

    def __call__(self):
        if self.deque.maxlen == len(self.deque):
            cTime = time.time()
            if cTime - self.deque[0] > self.timeUnit:
                self.deque.append(cTime)
                return False
            else:
                return True
        self.deque.append(time.time())
        return False

r = RateLimiter()
for i in range(0,100):
    time.sleep(0.1)
    print(i, "block" if r() else "pass")

回答 7

只是接受的答案中的代码的python实现。

import time

class Object(object):
    pass

def get_throttler(rate, per):
    scope = Object()
    scope.allowance = rate
    scope.last_check = time.time()
    def throttler(fn):
        current = time.time()
        time_passed = current - scope.last_check;
        scope.last_check = current;
        scope.allowance = scope.allowance + time_passed * (rate / per)
        if (scope.allowance > rate):
          scope.allowance = rate
        if (scope.allowance < 1):
          pass
        else:
          fn()
          scope.allowance = scope.allowance - 1
    return throttler

Just a python implementation of a code from accepted answer.

import time

class Object(object):
    pass

def get_throttler(rate, per):
    scope = Object()
    scope.allowance = rate
    scope.last_check = time.time()
    def throttler(fn):
        current = time.time()
        time_passed = current - scope.last_check;
        scope.last_check = current;
        scope.allowance = scope.allowance + time_passed * (rate / per)
        if (scope.allowance > rate):
          scope.allowance = rate
        if (scope.allowance < 1):
          pass
        else:
          fn()
          scope.allowance = scope.allowance - 1
    return throttler

回答 8

这个怎么样:

long check_time = System.currentTimeMillis();
int msgs_sent_count = 0;

private boolean isRateLimited(int msgs_per_sec) {
    if (System.currentTimeMillis() - check_time > 1000) {
        check_time = System.currentTimeMillis();
        msgs_sent_count = 0;
    }

    if (msgs_sent_count > (msgs_per_sec - 1)) {
        return true;
    } else {
        msgs_sent_count++;
    }

    return false;
}

How about this:

long check_time = System.currentTimeMillis();
int msgs_sent_count = 0;

private boolean isRateLimited(int msgs_per_sec) {
    if (System.currentTimeMillis() - check_time > 1000) {
        check_time = System.currentTimeMillis();
        msgs_sent_count = 0;
    }

    if (msgs_sent_count > (msgs_per_sec - 1)) {
        return true;
    } else {
        msgs_sent_count++;
    }

    return false;
}

回答 9

我需要一个Scala版本。这里是:

case class Limiter[-A, +B](callsPerSecond: (Double, Double), f: A  B) extends (A  B) {

  import Thread.sleep
  private def now = System.currentTimeMillis / 1000.0
  private val (calls, sec) = callsPerSecond
  private var allowance  = 1.0
  private var last = now

  def apply(a: A): B = {
    synchronized {
      val t = now
      val delta_t = t - last
      last = t
      allowance += delta_t * (calls / sec)
      if (allowance > calls)
        allowance = calls
      if (allowance < 1d) {
        sleep(((1 - allowance) * (sec / calls) * 1000d).toLong)
      }
      allowance -= 1
    }
    f(a)
  }

}

使用方法如下:

val f = Limiter((5d, 8d), { 
  _: Unit  
    println(System.currentTimeMillis) 
})
while(true){f(())}

I needed a variation in Scala. Here it is:

case class Limiter[-A, +B](callsPerSecond: (Double, Double), f: A ⇒ B) extends (A ⇒ B) {

  import Thread.sleep
  private def now = System.currentTimeMillis / 1000.0
  private val (calls, sec) = callsPerSecond
  private var allowance  = 1.0
  private var last = now

  def apply(a: A): B = {
    synchronized {
      val t = now
      val delta_t = t - last
      last = t
      allowance += delta_t * (calls / sec)
      if (allowance > calls)
        allowance = calls
      if (allowance < 1d) {
        sleep(((1 - allowance) * (sec / calls) * 1000d).toLong)
      }
      allowance -= 1
    }
    f(a)
  }

}

Here is how it can be used:

val f = Limiter((5d, 8d), { 
  _: Unit ⇒ 
    println(System.currentTimeMillis) 
})
while(true){f(())}

在Python中拆分空字符串时,为什么split()返回空列表,而split(’\ n’)返回[”]?

问题:在Python中拆分空字符串时,为什么split()返回空列表,而split(’\ n’)返回[”]?

split('\n')用来获取一个字符串中的行,并发现''.split()返回一个空列表[],而''.split('\n')return ['']。有什么特殊原因造成这种差异?

还有没有更方便的方法来计算字符串中的行数?

I am using split('\n') to get lines in one string, and found that ''.split() returns an empty list, [], while ''.split('\n') returns ['']. Is there any specific reason for such a difference?

And is there any more convenient way to count lines in a string?


回答 0

问题:我正在使用split(’\ n’)在一个字符串中获取行,并发现”.split()返回空列表[],而”.split(’\ n’)返回[”] 。

所述str.split()方法有两种算法。如果未提供任何参数,它将在重复运行空白时拆分。但是,如果给出参数,则将其视为单个定界符,且不会重复运行。

在拆分空字符串的情况下,第一种模式(无参数)将返回一个空列表,因为空白被吃掉并且结果列表中没有任何值。

相比之下,第二种模式(带有参数如\n)将产生第一个空字段。考虑一下您是否写过'\n'.split('\n'),您将得到两个字段(一个字段拆分成两半)。

问题:有什么特殊原因造成这种差异?

当数据在具有可变空白量的列中对齐时,第一种模式很有用。例如:

>>> data = '''\
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print line.split()

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

第二种模式对于定界数据(例如CSV)很有用,其中重复的逗号表示空白字段。例如:

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print line.split(',')

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

注意,结果字段的数量比定界符的数量大一。想想剪一条绳子。如果不削减,则只有一件。一切,给出两块。进行两次切割,得到三块。Python的str.split(delimiter)方法也是如此:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

问题:还有什么更方便的方法来计算字符串中的行数?

是的,有两种简单的方法。一个使用str.count(),另一个使用str.splitlines()。除非最后一行缺少,否则两种方法都将给出相同的答案\n。如果最后的换行符丢失,则str.splitlines方法将给出准确的答案。一种更快且更准确的技术是使用count方法,然后将其更正为最终的换行符:

>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('\n')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n'))   # Accurate and fast
4    

来自@Kaz的问题:为什么两个非常不同的算法被误用到一个函数中?

str.split的签名大约有20年的历史了,那个时代的许多API都是严格实用的。虽然并不完美,但方法签名也不是“糟糕的”。在大多数情况下,Guido的API设计选择经受了时间的考验。

当前的API并非没有优势。考虑如下字符串:

ps_aux_header  = "USER               PID  %CPU %MEM      VSZ"
patient_header = "name,age,height,weight"

当要求将这些字符串分成多个字段时,人们倾向于使用相同的英语单词“ split”来描述这两个字符串。当要求读取诸如fields = line.split() 或的代码时fields = line.split(','),人们倾向于正确地将语句解释为“将行拆分为字段”。

Microsoft Excel的“ 文本到列”工具做出了类似的API选择,并将两种分割算法都合并到了同一工具中。尽管似乎涉及多个算法,但人们似乎在思维上将字段拆分建模为一个单独的概念。

Question: I am using split(‘\n’) to get lines in one string, and found that ”.split() returns empty list [], while ”.split(‘\n’) returns [”].

The str.split() method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.

In contrast, the second mode (with an argument such as \n) will produce the first empty field. Consider if you had written '\n'.split('\n'), you would get two fields (one split, gives you two halves).

Question: Is there any specific reason for such a difference?

This first mode is useful when data is aligned in columns with variable amounts of whitespace. For example:

>>> data = '''\
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print line.split()

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

The second mode is useful for delimited data such as CSV where repeated commas denote empty fields. For example:

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print line.split(',')

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

Note, the number of result fields is one greater than the number of delimiters. Think of cutting a rope. If you make no cuts, you have one piece. Making one cut, gives two pieces. Making two cuts, gives three pieces. And so it is with Python’s str.split(delimiter) method:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

Question: And is there any more convenient way to count lines in a string?

Yes, there are a couple of easy ways. One uses str.count() and the other uses str.splitlines(). Both ways will give the same answer unless the final line is missing the \n. If the final newline is missing, the str.splitlines approach will give the accurate answer. A faster technique that is also accurate uses the count method but then corrects it for the final newline:

>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('\n')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n'))   # Accurate and fast
4    

Question from @Kaz: Why the heck are two very different algorithms shoe-horned into a single function?

The signature for str.split is about 20 years old, and a number of the APIs from that era are strictly pragmatic. While not perfect, the method signature isn’t “terrible” either. For the most part, Guido’s API design choices have stood the test of time.

The current API is not without advantages. Consider strings such as:

ps_aux_header  = "USER               PID  %CPU %MEM      VSZ"
patient_header = "name,age,height,weight"

When asked to break these strings into fields, people tend to describe both using the same English word, “split”. When asked to read code such as fields = line.split() or fields = line.split(','), people tend to correctly interpret the statements as “splits a line into fields”.

Microsoft Excel’s text-to-columns tool made a similar API choice and incorporates both splitting algorithms in the same tool. People seem to mentally model field-splitting as a single concept even though more than one algorithm is involved.


回答 1

根据文档,这似乎只是它应该工作的方式:

使用指定的分隔符分割空字符串将返回['']

如果未指定sep或为None,则将应用不同的拆分算法:连续的空白行将被视为单个分隔符,并且如果字符串的开头或结尾处有空白,则结果在开头或结尾将不包含空字符串。因此,使用None分隔符拆分空字符串或仅包含空格的字符串将返回[]。

因此,为了更清楚一点,该split()函数实现了两种不同的拆分算法,并使用参数的存在来决定要运行哪个参数。这可能是因为它允许优化一个不带参数的参数,而不是优化带参数的参数。我不知道。

It seems to simply be the way it’s supposed to work, according to the documentation:

Splitting an empty string with a specified separator returns [''].

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a None separator returns [].

So, to make it clearer, the split() function implements two different splitting algorithms, and uses the presence of an argument to decide which one to run. This might be because it allows optimizing the one for no arguments more than the one with arguments; I don’t know.


回答 2

.split()没有参数的人会变得聪明。它在任何空格,制表符,空格,换行符等处分割,并因此跳过所有空字符串。

>>> "  fii    fbar \n bopp ".split()
['fii', 'fbar', 'bopp']

本质上,.split()不带参数的用于从字符串中提取单词,而.split()带参数的参数只是带一个字符串并将其分割。

这就是差异的原因。

是的,通过分割来计数行不是一种有效的方法。计算换行符的数量,如果字符串不以换行符结尾,则加一个。

.split() without parameters tries to be clever. It splits on any whitespace, tabs, spaces, line feeds etc, and it also skips all empty strings as a result of this.

>>> "  fii    fbar \n bopp ".split()
['fii', 'fbar', 'bopp']

Essentially, .split() without parameters are used to extract words from a string, as opposed to .split() with parameters which just takes a string and splits it.

That’s the reason for the difference.

And yeah, counting lines by splitting is not an efficient way. Count the number of line feeds, and add one if the string doesn’t end with a line feed.


回答 3

用途count()

s = "Line 1\nLine2\nLine3"
n_lines = s.count('\n') + 1

Use count():

s = "Line 1\nLine2\nLine3"
n_lines = s.count('\n') + 1

回答 4

>>> print str.split.__doc__
S.split([sep [,maxsplit]]) -> list of strings

Return a list of the words in the string S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.

注意最后一句话。

要计算行数,您可以简单地计算行数\n

line_count = some_string.count('\n') + some_string[-1] != '\n'

最后一部分考虑到不结束最后一行\n,即使这意味着,Hello, World!Hello, World!\n具有相同的行数(这对我来说是合理的),否则,你可以简单地添加1到的计数\n

>>> print str.split.__doc__
S.split([sep [,maxsplit]]) -> list of strings

Return a list of the words in the string S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.

Note the last sentence.

To count lines you can simply count how many \n are there:

line_count = some_string.count('\n') + some_string[-1] != '\n'

The last part takes into account the last line that do not end with \n, even though this means that Hello, World! and Hello, World!\n have the same line count(which for me is reasonable), otherwise you can simply add 1 to the count of \n.


回答 5

要计算行数,可以计算换行数:

n_lines = sum(1 for s in the_string if s == "\n") + 1 # add 1 for last line

编辑

内置的另一个答案count更合适,实际上

To count lines, you can count the number of line breaks:

n_lines = sum(1 for s in the_string if s == "\n") + 1 # add 1 for last line

Edit:

The other answer with built-in count is more suitable, actually


滚动或滑动窗口迭代器?

问题:滚动或滑动窗口迭代器?

我需要在序列/迭代器/生成器上可迭代的滚动窗口(又称滑动窗口)。可以将默认Python迭代视为一种特殊情况,其中窗口长度为1。我当前正在使用以下代码。有没有人有更多的Python风格,更少的冗长或更有效的方法来执行此操作?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:

   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. Default Python iteration can be considered a special case, where the window length is 1. I’m currently using the following code. Does anyone have a more Pythonic, less verbose, or more efficient method for doing this?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:

   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

回答 0

Python文档的旧版本中有一个带有itertools示例

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

文档中的一个更加简洁,itertools我想它可以起到更大的作用。

There’s one in an old version of the Python docs with itertools examples:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

The one from the docs is a little more succinct and uses itertools to greater effect I imagine.


回答 1

这似乎是为a量身定制的,collections.deque因为您实际上具有FIFO(添加到一端,从另一端移除)。但是,即使使用a list,也不应切片两次;相反,您可能应该只pop(0)从列表和append()新项目开始。

这是一个优化的基于双端队列的实现,该实现以原始格式为基础:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

在我的测试中,它在大多数情况下都轻而易举地击败了此处发布的所有其他内容,尽管pillmuncher的tee版本在大型可迭代项和小窗口方面均胜过了它。在较大的窗户上,deque原始速度再次向前拉。

deque与中的列表或元组相比,对中单个项目的访问可能更快或更慢。(如果使用负索引,则开头附近的项目会更快,或者结尾附近的项目会更快。)我将a sum(w)放入循环的主体中;这会影响双端队列的强度(从一个项目到另一个项目的迭代速度很快,因此此循环比第二个最快的方法pillmuncher的运行速度快了整整20%)。当我将其更改为单独查找并在10个窗口中添加项目时,表格翻转了,该tee方法快了20%。通过使用最后五个词的负索引,我能够恢复一定的速度,但tee速度仍然要快一些。总的来说,我估计这两种方法对于大多数用途来说都足够快,如果您需要更多性能,请配置并选择最有效的方法。

This seems tailor-made for a collections.deque since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a list you shouldn’t be slicing twice; instead, you should probably just pop(0) from the list and append() the new item.

Here is an optimized deque-based implementation patterned after your original:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

In my tests it handily beats everything else posted here most of the time, though pillmuncher’s tee version beats it for large iterables and small windows. On larger windows, the deque pulls ahead again in raw speed.

Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque’s strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher’s). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.


回答 2

我喜欢tee()

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(xrange(6), 3):
    print list(each)

给出:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

I like tee():

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(xrange(6), 3):
    print list(each)

gives:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

回答 3

下面是添加了对泛化stepfillvalue参数:

from collections import deque
from itertools import islice

def sliding_window(iterable, size=2, step=1, fillvalue=None):
    if size < 0 or step < 1:
        raise ValueError
    it = iter(iterable)
    q = deque(islice(it, size), maxlen=size)
    if not q:
        return  # empty iterable or size == 0
    q.extend(fillvalue for _ in range(size - len(q)))  # pad to size
    while True:
        yield iter(q)  # iter() to avoid accidental outside modifications
        try:
            q.append(next(it))
        except StopIteration: # Python 3.5 pep 479 support
            return
        q.extend(next(it, fillvalue) for _ in range(step - 1))

每次迭代size时,它会在滚动step位置一次生成块项目,并fillvalue在必要时填充每个块。示例size=4, step=3, fillvalue='*'

 [a b c d]e f g h i j k l m n o p q r s t u v w x y z
  a b c[d e f g]h i j k l m n o p q r s t u v w x y z
  a b c d e f[g h i j]k l m n o p q r s t u v w x y z
  a b c d e f g h i[j k l m]n o p q r s t u v w x y z
  a b c d e f g h i j k l[m n o p]q r s t u v w x y z
  a b c d e f g h i j k l m n o[p q r s]t u v w x y z
  a b c d e f g h i j k l m n o p q r[s t u v]w x y z
  a b c d e f g h i j k l m n o p q r s t u[v w x y]z
  a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]

有关该step参数用例的示例,请参阅有效地在python中处理大型.txt文件

Here’s a generalization that adds support for step, fillvalue parameters:

from collections import deque
from itertools import islice

def sliding_window(iterable, size=2, step=1, fillvalue=None):
    if size < 0 or step < 1:
        raise ValueError
    it = iter(iterable)
    q = deque(islice(it, size), maxlen=size)
    if not q:
        return  # empty iterable or size == 0
    q.extend(fillvalue for _ in range(size - len(q)))  # pad to size
    while True:
        yield iter(q)  # iter() to avoid accidental outside modifications
        try:
            q.append(next(it))
        except StopIteration: # Python 3.5 pep 479 support
            return
        q.extend(next(it, fillvalue) for _ in range(step - 1))

It yields in chunks size items at a time rolling step positions per iteration padding each chunk with fillvalue if necessary. Example for size=4, step=3, fillvalue='*':

 [a b c d]e f g h i j k l m n o p q r s t u v w x y z
  a b c[d e f g]h i j k l m n o p q r s t u v w x y z
  a b c d e f[g h i j]k l m n o p q r s t u v w x y z
  a b c d e f g h i[j k l m]n o p q r s t u v w x y z
  a b c d e f g h i j k l[m n o p]q r s t u v w x y z
  a b c d e f g h i j k l m n o[p q r s]t u v w x y z
  a b c d e f g h i j k l m n o p q r[s t u v]w x y z
  a b c d e f g h i j k l m n o p q r s t u[v w x y]z
  a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]

For an example of use case for the step parameter, see Processing a large .txt file in python efficiently.


回答 4

有一个库可以完全满足您的需求:

import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))

Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

There is a library which does exactly what you need:

import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))

Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

回答 5

只是一个快速的贡献。

由于当前的python文档在itertool示例中没有“窗口”(即,位于http://docs.python.org/library/itertools.html的底部),因此下面是基于grouper代码的代码段,是给出的示例之一:

import itertools as it
def window(iterable, size):
    shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
    return it.izip(*shiftedStarts)

基本上,我们创建了一系列切片式迭代器,每个迭代器的起点都向前一点。然后,将它们压缩在一起。注意,此函数返回一个生成器(它本身不是直接生成器)。

就像上面的appending-element和advancing-iterator版本一样,性能(即最佳)随列表大小和窗口大小而变化。我喜欢这一行,因为它是两行的(可能是一行,但我更喜欢命名概念)。

原来,以上代码是错误的。如果传递给iterable的参数是一个序列,则有效,但如果它是迭代器,则无效。如果它是一个迭代器,则在islice调用之间共享同一迭代器(但不发送),这会严重破坏事情。

这是一些固定的代码:

import itertools as it
def window(iterable, size):
    itrs = it.tee(iterable, size)
    shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
    return it.izip(*shiftedStarts)

此外,还有其他版本的书籍。该版本没有复制一个迭代器,而是先进行多次复制,而是随着我们向前移动起始位置,对每个迭代器进行成对复制。因此,迭代器t为“完整”的迭代器提供了从t的起点,并且为创建迭代器t + 1提供了基础:

import itertools as it
def window4(iterable, size):
    complete_itr, incomplete_itr = it.tee(iterable, 2)
    iters = [complete_itr]
    for i in xrange(1, size):
        incomplete_itr.next()
        complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
        iters.append(complete_itr)
    return it.izip(*iters)

Just a quick contribution.

Since the current python docs don’t have “window” in the itertool examples (i.e., at the bottom of http://docs.python.org/library/itertools.html), here’s an snippet based on the code for grouper which is one of the examples given:

import itertools as it
def window(iterable, size):
    shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
    return it.izip(*shiftedStarts)

Basically, we create a series of sliced iterators, each with a starting point one spot further forward. Then, we zip these together. Note, this function returns a generator (it is not directly a generator itself).

Much like the appending-element and advancing-iterator versions above, the performance (i.e., which is best) varies with list size and window size. I like this one because it is a two-liner (it could be a one-liner, but I prefer naming concepts).

It turns out that the above code is wrong. It works if the parameter passed to iterable is a sequence but not if it is an iterator. If it is an iterator, the same iterator is shared (but not tee’d) among the islice calls and this breaks things badly.

Here is some fixed code:

import itertools as it
def window(iterable, size):
    itrs = it.tee(iterable, size)
    shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
    return it.izip(*shiftedStarts)

Also, one more version for the books. Instead of copying an iterator and then advancing copies many times, this version makes pairwise copies of each iterator as we move the starting position forward. Thus, iterator t provides both the “complete” iterator with starting point at t and also the basis for creating iterator t + 1:

import itertools as it
def window4(iterable, size):
    complete_itr, incomplete_itr = it.tee(iterable, 2)
    iters = [complete_itr]
    for i in xrange(1, size):
        incomplete_itr.next()
        complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
        iters.append(complete_itr)
    return it.izip(*iters)

回答 6

只是为了展示如何结合itertools的食谱,我延长了pairwise配方尽可能直接回window用的配方consume配方:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

def window(iterable, n=2):
    "s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
    iters = tee(iterable, n)
    # Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
    # slower for larger window sizes, while saving only small fixed "noop" cost
    for i, it in enumerate(iters):
        consume(it, i)
    return zip(*iters)

window配方是一样的pairwise,它只是代替在第二的单个元素“消费” tee-ed迭代器上逐步增加消耗n - 1迭代器。使用(consume而不是包装每个迭代器islice)速度稍快(对于足够大的可迭代对象),因为您只isliceconsume阶段中支付包装开销,而不是在提取每个窗口值的过程中支付包装开销(因此,它受限制n,而不是其中的项目数限制)iterable)。

在性能方面,与其他一些解决方案相比,这是相当不错的(并且比我测试过的其他任何解决方案都要好)。使用ipython %timeit魔术在Python 3.5.0,Linux x86-64上进行了测试。

kindall的deque解决方案,通过使用islice而不是原始生成器表达式来调整性能/正确性,并测试了结果长度,以便当iterable的长度小于window时,以及maxlendeque位置传递而不是在按关键字(对于较小的输入产生令人惊讶的差异):

>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop

与以前采用的kindall解决方案相同,但都进行了yield win更改,yield tuple(win)因此生成器的存储结果可以正常工作,而所有存储的结果实际上都不是最新结果的视图(在这种情况下,所有其他合理的解决方案都是安全的),并添加tuple=tuple到函数定义中移动应用tupleBLEGBL

>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop

consume上面显示的基于解决方案:

>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop

与相同consume,但内联的else情况是consume避免函数调用和n is None测试以减少运行时间,尤其是对于小型输入(其中设置开销是工作中有意义的一部分)的情况:

>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop

(旁注:其上的变体pairwise使用tee默认参数2重复制作嵌套tee对象,因此任何给定的迭代器仅前进一次,而不是独立消耗越来越多的次数,类似于MrDrFenner的答案类似于非内联的consume并且比consume所有测试中的内联速度慢,因此为简洁起见,我省略了这些结果)。

如您所见,如果您不关心调用方是否需要存储结果,我的kindall解决方案的优化版本在大多数情况下都会获胜,除非是在“大迭代,小窗口大小的情况下”(内联consume获胜) ); 随着可迭代大小的增加,它会迅速降级,而随着窗口大小的增加,它根本不会降级(其他解决方案随着可迭代大小的增加而降级得更慢,但随着窗口大小的增加而降级)。它甚至可以通过包裹入来适应“需要元组”的情况map(tuple, ...),它的运行速度比将小结添加到函数中的速度稍慢,但它是微不足道的(需要花费1-5%的时间),并让您保持运行速度更快的灵活性当您可以容忍重复返回相同的值时。

如果您需要安全地防止退货被存储,则内联consume赢得所有输入最小输入大小)(非内联consume则稍慢一些,但缩放比例类似)。基于deque&捆绑的解决方案仅因最小的安装成本而获得了成功,因为安装成本较小,并且收益很小。随着迭代时间的延长,它的性能将严重下降。

为了记录在案,kindall的解决方案,它的改编版yield小号tuple的I所用的是:

def windowkindalltupled(iterable, n=2, tuple=tuple):
    it = iter(iterable)
    win = deque(islice(it, n), n)
    if len(win) < n:
        return
    append = win.append
    yield tuple(win)
    for e in it:
        append(e)
        yield tuple(win)

删除tuple函数定义行中的缓存并tuple在每个函数中使用yield以获得更快但不太安全的版本。

Just to show how you can combine itertools recipes, I’m extending the pairwise recipe as directly as possible back into the window recipe using the consume recipe:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

def window(iterable, n=2):
    "s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
    iters = tee(iterable, n)
    # Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
    # slower for larger window sizes, while saving only small fixed "noop" cost
    for i, it in enumerate(iters):
        consume(it, i)
    return zip(*iters)

The window recipe is the same as for pairwise, it just replaces the single element “consume” on the second tee-ed iterator with progressively increasing consumes on n - 1 iterators. Using consume instead of wrapping each iterator in islice is marginally faster (for sufficiently large iterables) since you only pay the islice wrapping overhead during the consume phase, not during the process of extracting each window-ed value (so it’s bounded by n, not the number of items in iterable).

Performance-wise, compared to some other solutions, this is pretty good (and better than any of the other solutions I tested as it scales). Tested on Python 3.5.0, Linux x86-64, using ipython %timeit magic.

kindall’s the deque solution, tweaked for performance/correctness by using islice instead of a home-rolled generator expression and testing the resulting length so it doesn’t yield results when the iterable is shorter than the window, as well as passing the maxlen of the deque positionally instead of by keyword (makes a surprising difference for smaller inputs):

>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop

Same as previous adapted kindall solution, but with each yield win changed to yield tuple(win) so storing results from the generator works without all stored results really being a view of the most recent result (all other reasonable solutions are safe in this scenario), and adding tuple=tuple to the function definition to move use of tuple from the B in LEGB to the L:

>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop

consume-based solution shown above:

>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop

Same as consume, but inlining else case of consume to avoid function call and n is None test to reduce runtime, particularly for small inputs where the setup overhead is a meaningful part of the work:

>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop

(Side-note: A variant on pairwise that uses tee with the default argument of 2 repeatedly to make nested tee objects, so any given iterator is only advanced once, not independently consumed an increasing number of times, similar to MrDrFenner’s answer is similar to non-inlined consume and slower than the inlined consume on all tests, so I’ve omitted it those results for brevity).

As you can see, if you don’t care about the possibility of the caller needing to store results, my optimized version of kindall’s solution wins most of the time, except in the “large iterable, small window size case” (where inlined consume wins); it degrades quickly as the iterable size increases, while not degrading at all as the window size increases (every other solution degrades more slowly for iterable size increases, but also degrades for window size increases). It can even be adapted for the “need tuples” case by wrapping in map(tuple, ...), which runs ever so slightly slower than putting the tupling in the function, but it’s trivial (takes 1-5% longer) and lets you keep the flexibility of running faster when you can tolerate repeatedly returning the same value.

If you need safety against returns being stored, inlined consume wins on all but the smallest input sizes (with non-inlined consume being slightly slower but scaling similarly). The deque & tupling based solution wins only for the smallest inputs, due to smaller setup costs, and the gain is small; it degrades badly as the iterable gets longer.

For the record, the adapted version of kindall’s solution that yields tuples I used was:

def windowkindalltupled(iterable, n=2, tuple=tuple):
    it = iter(iterable)
    win = deque(islice(it, n), n)
    if len(win) < n:
        return
    append = win.append
    yield tuple(win)
    for e in it:
        append(e)
        yield tuple(win)

Drop the caching of tuple in the function definition line and the use of tuple in each yield to get the faster but less safe version.


回答 7

我将以下代码用作一个简单的滑动窗口,该窗口使用生成器来大大提高可读性。根据我的经验,到目前为止,它的速度已经足够用于生物信息学序列分析。

我将其包含在此处是因为我尚未看到使用此方法。同样,我对它的比较性能没有任何要求。

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]

I use the following code as a simple sliding window that uses generators to drastically increase readability. Its speed has so far been sufficient for use in bioinformatics sequence analysis in my experience.

I include it here because I didn’t see this method used yet. Again, I make no claims about its compared performance.

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]

回答 8

def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]

回答 9

双端队列窗口的略微修改版本,以使其成为真正的滚动窗口。这样它就开始只用一个元素填充,然后增长到最大窗口大小,然后随着左边缘的临近而缩小:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

这给

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]

a slightly modified version of the deque window, to make it a true rolling window. So that it starts being populated with just one element, then grows to it’s maximum window size, and then shrinks as it’s left edge comes near the end:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

this gives

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]

回答 10

def rolling_window(list, degree):
    for i in range(len(list)-degree+1):
        yield [list[i+o] for o in range(degree)]

进行滚动平均功能

def rolling_window(list, degree):
    for i in range(len(list)-degree+1):
        yield [list[i+o] for o in range(degree)]

Made this for a rolling average function


回答 11

为什么不

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

它记录在Python doc中。您可以轻松地将其扩展到更大的窗口。

why not

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

It is documented in Python doc . You can easily extend it to wider window.


回答 12

多个迭代器!

def window(seq, size, step=1):
    # initialize iterators
    iters = [iter(seq) for i in range(size)]
    # stagger iterators (without yielding)
    [next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
    while(True):
        yield [next(i) for i in iters]
        # next line does nothing for step = 1 (skips iterations for step > 1)
        [next(i) for i in iters for j in range(step-1)]

next(it)StopIteration当序列完成时引发,并且由于超出我的一些凉爽原因,此处的yield语句除外,并且函数返回,而忽略了未形成完整窗口的剩余值。

无论如何,这是最低线的解决方案还没有,其唯一的要求是seq实现无论是__iter____getitem__和不依赖于itertoolscollections除了@ dansalmo的解决方案:)

Multiple iterators!

def window(seq, size, step=1):
    # initialize iterators
    iters = [iter(seq) for i in range(size)]
    # stagger iterators (without yielding)
    [next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
    while(True):
        yield [next(i) for i in iters]
        # next line does nothing for step = 1 (skips iterations for step > 1)
        [next(i) for i in iters for j in range(step-1)]

next(it) raises StopIteration when the sequence is finished, and for some cool reason that’s beyond me, the yield statement here excepts it and the function returns, ignoring the leftover values that don’t form a full window.

Anyway, this is the least-lines solution yet whose only requirement is that seq implement either __iter__ or __getitem__ and doesn’t rely on itertools or collections besides @dansalmo’s solution :)


回答 13

让我们变得懒惰吧!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]

Let’s make it lazy!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]

回答 14

#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size), 
strides = arr.strides*2)

"""Example output:

  [0, 1, 2]
  [1, 2, 3]
  [2, 3, 4]
  [3, 4, 5]

“”

#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size), 
strides = arr.strides*2)

"""Example output:

  [0, 1, 2]
  [1, 2, 3]
  [2, 3, 4]
  [3, 4, 5]

“””


回答 15

我测试了一些解决方案,然后提出了一个解决方案,发现我提出的解决方案是最快的,因此我想与大家分享。

import itertools
import sys

def windowed(l, stride):
    return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])

I tested a few solutions and one I came up with and found the one I came up with to be the fastest so I thought I would share it.

import itertools
import sys

def windowed(l, stride):
    return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])

回答 16

>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

回答 17

如何使用以下内容:

mylist = [1, 2, 3, 4, 5, 6, 7]

def sliding_window(l, window_size=2):
    if window_size > len(l):
        raise ValueError("Window size must be smaller or equal to the number of elements in the list.")

    t = []
    for i in xrange(0, window_size):
        t.append(l[i:])

    return zip(*t)

print sliding_window(mylist, 3)

输出:

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]

How about using the following:

mylist = [1, 2, 3, 4, 5, 6, 7]

def sliding_window(l, window_size=2):
    if window_size > len(l):
        raise ValueError("Window size must be smaller or equal to the number of elements in the list.")

    t = []
    for i in xrange(0, window_size):
        t.append(l[i:])

    return zip(*t)

print sliding_window(mylist, 3)

Output:

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]

回答 18

这是一个古老的问题,但是对于那些仍然感兴趣的人来说,在页面中使用生成器可以很好地实现窗口滑块的实现(作者:Adrian Rosebrock)。

它是OpenCV的一种实现,但是您可以轻松地将其用于其他目的。对于急切的人,我将代码粘贴在这里,但是为了更好地理解它,我建议访问原始页面。

def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

提示:.shape迭代生成器时,您可以检查窗口的,以丢弃不符合您要求的窗口

干杯

This is an old question but for those still interested there is a great implementation of a window slider using generators in this page (by Adrian Rosebrock).

It is an implementation for OpenCV however you can easily use it for any other purpose. For the eager ones i’ll paste the code here but to understand it better I recommend visiting the original page.

def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

Tip: You can check the .shape of the window when iterating the generator to discard those that do not meet your requirements

Cheers


回答 19

修改了DiPaolo的答案以允许任意填充和可变步长

import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(itertools.islice(it, n))    
    if len(result) == n:
        yield result
    while True:        
#         for elem in it:        
        elem = tuple( next(it, fill) for _ in range(step))
        result = result[step:] + elem        
        if elem[-1] is fill:
            if keep:
                yield result
            break
        yield result

Modified DiPaolo’s answer to allow arbitrary fill and variable step size

import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(itertools.islice(it, n))    
    if len(result) == n:
        yield result
    while True:        
#         for elem in it:        
        elem = tuple( next(it, fill) for _ in range(step))
        result = result[step:] + elem        
        if elem[-1] is fill:
            if keep:
                yield result
            break
        yield result

回答 20

这是一行代码。我对它进行了计时,它与最佳答案的性能相当,并随着seq的增大而逐渐变好,len(seq)= 20时变慢了20%,len(seq)= 10000时变慢了7%

zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])

here is a one liner. I timed it and it’s comprable to the performance of the top answer and gets progressively better with larger seq from 20% slower with len(seq) = 20 and 7% slower with len(seq) = 10000

zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])

回答 21

使用islice尝试我的简单,一种衬里,pythonic方式。但是,可能不是最佳效率。

from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

说明:通过使用window_size的islice创建窗口,并使用map在所有数组上进行迭代。

Trying my part, simple, one liner, pythonic way using islice. But, may not be optimally efficient.

from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

Explanation: Create window by using islice of window_size and iterate this operation using map over all array.


回答 22

深度学习中滑动窗口数据的优化功能

def SlidingWindow(X, window_length, stride):
    indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
    return X.take(indexer)

Optimized Function for sliding window data in Deep learning

def SlidingWindow(X, window_length, stride):
    indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
    return X.take(indexer)

to apply on multidimensional array

import numpy as np
def SlidingWindow(X, window_length, stride1):
    stride=  X.shape[1]*stride1
    window_length = window_length*X.shape[1]
    indexer = np.arange(window_length)[None, :] + stride1*np.arange(int(len(X)/stride1)-window_length-1)[:, None]
    return X.take(indexer)

将多个过滤器应用于pandas DataFrame或Series的有效方法

问题:将多个过滤器应用于pandas DataFrame或Series的有效方法

我有一种情况,用户想要将多个过滤器应用于Pandas DataFrame或Series对象。本质上,我想有效地将​​用户在运行时指定的一堆过滤(比较操作)链接在一起。

过滤器应为可加性的(又称每个过滤器应缩小结果)。

我当前正在使用,reindex()但这每次都会创建一个新对象并复制基础数据(如果我正确理解了文档)。因此,这在过滤大型Series或DataFrame时可能效率很低。

我认为使用apply()map()或类似的方法可能更好。我对Pandas来说还很陌生,因此仍然想尽一切办法。

TL; DR

我想采用以下形式的字典,并将每个操作应用于给定的Series对象,并返回一个“过滤后的” Series对象。

relops = {'>=': [1], '<=': [1]}

长例子

我将从当前的示例开始,仅过滤单个Series对象。以下是我当前正在使用的功能:

   def apply_relops(series, relops):
        """
        Pass dictionary of relational operators to perform on given series object
        """
        for op, vals in relops.iteritems():
            op_func = ops[op]
            for val in vals:
                filtered = op_func(series, val)
                series = series.reindex(series[filtered])
        return series

用户向字典提供他们要执行的操作:

>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
   col1  col2
0     0    10
1     1    11
2     2    12

>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
col1
1       1
2       2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
col1
1       1
Name: col1

同样,上述方法的“问题”是,我认为中间步骤之间存在很多不必要的数据复制。

另外,我想对此进行扩展,以便传入的字典可以包括要操作的列,并根据输入字典过滤整个DataFrame。但是,我假设该系列的所有工作都可以轻松扩展到DataFrame。

I have a scenario where a user wants to apply several filters to a Pandas DataFrame or Series object. Essentially, I want to efficiently chain a bunch of filtering (comparison operations) together that are specified at run-time by the user.

The filters should be additive (aka each one applied should narrow results).

I’m currently using reindex() but this creates a new object each time and copies the underlying data (if I understand the documentation correctly). So, this could be really inefficient when filtering a big Series or DataFrame.

I’m thinking that using apply(), map(), or something similar might be better. I’m pretty new to Pandas though so still trying to wrap my head around everything.

TL;DR

I want to take a dictionary of the following form and apply each operation to a given Series object and return a ‘filtered’ Series object.

relops = {'>=': [1], '<=': [1]}

Long Example

I’ll start with an example of what I have currently and just filtering a single Series object. Below is the function I’m currently using:

   def apply_relops(series, relops):
        """
        Pass dictionary of relational operators to perform on given series object
        """
        for op, vals in relops.iteritems():
            op_func = ops[op]
            for val in vals:
                filtered = op_func(series, val)
                series = series.reindex(series[filtered])
        return series

The user provides a dictionary with the operations they want to perform:

>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
   col1  col2
0     0    10
1     1    11
2     2    12

>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
col1
1       1
2       2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
col1
1       1
Name: col1

Again, the ‘problem’ with my above approach is that I think there is a lot of possibly unnecessary copying of the data for the in-between steps.

Also, I would like to expand this so that the dictionary passed in can include the columns to operator on and filter an entire DataFrame based on the input dictionary. However, I’m assuming whatever works for the Series can be easily expanded to a DataFrame.


回答 0

熊猫(和numpy)允许使用布尔索引,这将更加高效:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]: 
1    1
2    2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]: 
   col1  col2
1     1    11
2     2    12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]: 
   col1  col2
1     1    11

如果要为此编写辅助函数,请考虑以下方面:

In [14]: def b(x, col, op, n): 
             return op(x[col],n)

In [15]: def f(x, *b):
             return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]: 
   col1  col2
1     1    11

更新:pandas 0.13针对此类用例提供了一种查询方法,假设列名是有效的标识符,则可以进行以下工作(并且对于大型框架,在幕后使用numexpr可能更为有效):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
   col1  col2
1     1    11

Pandas (and numpy) allow for boolean indexing, which will be much more efficient:

In [11]: df.loc[df['col1'] >= 1, 'col1']
Out[11]: 
1    1
2    2
Name: col1

In [12]: df[df['col1'] >= 1]
Out[12]: 
   col1  col2
1     1    11
2     2    12

In [13]: df[(df['col1'] >= 1) & (df['col1'] <=1 )]
Out[13]: 
   col1  col2
1     1    11

If you want to write helper functions for this, consider something along these lines:

In [14]: def b(x, col, op, n): 
             return op(x[col],n)

In [15]: def f(x, *b):
             return x[(np.logical_and(*b))]

In [16]: b1 = b(df, 'col1', ge, 1)

In [17]: b2 = b(df, 'col1', le, 1)

In [18]: f(df, b1, b2)
Out[18]: 
   col1  col2
1     1    11

Update: pandas 0.13 has a query method for these kind of use cases, assuming column names are valid identifiers the following works (and can be more efficient for large frames as it uses numexpr behind the scenes):

In [21]: df.query('col1 <= 1 & 1 <= col1')
Out[21]:
   col1  col2
1     1    11

回答 1

链接条件会产生长行,而pep8不建议这样做。使用.query方法会强制使用字符串,该字符串功能强大但不具有Python特色,而且不是很动态。

一旦每个过滤器都安装到位,一种方法是

import numpy as np
import functools
def conjunction(*conditions):
    return functools.reduce(np.logical_and, conditions)

c_1 = data.col1 == True
c_2 = data.col2 < 64
c_3 = data.col3 != 4

data_filtered = data[conjunction(c1,c2,c3)]

np.ologic可以运行并且运行很快,但是不超过两个参数,这些参数由functools.reduce处理。

请注意,这仍然有一些冗余:a)快捷方式不会在全局级别上发生b)每个单独条件都在整个初始数据上运行。不过,我希望它对于许多应用程序都足够有效,并且可读性强。

Chaining conditions creates long lines, which are discouraged by pep8. Using the .query method forces to use strings, which is powerful but unpythonic and not very dynamic.

Once each of the filters is in place, one approach is

import numpy as np
import functools
def conjunction(*conditions):
    return functools.reduce(np.logical_and, conditions)

c_1 = data.col1 == True
c_2 = data.col2 < 64
c_3 = data.col3 != 4

data_filtered = data[conjunction(c1,c2,c3)]

np.logical operates on and is fast, but does not take more than two arguments, which is handled by functools.reduce.

Note that this still has some redundancies: a) shortcutting does not happen on a global level b) Each of the individual conditions runs on the whole initial data. Still, I expect this to be efficient enough for many applications and it is very readable.

You can also make a disjunction (wherein only one of the conditions needs to be true) by using np.logical_or instead:

import numpy as np
import functools
def disjunction(*conditions):
    return functools.reduce(np.logical_or, conditions)

c_1 = data.col1 == True
c_2 = data.col2 < 64
c_3 = data.col3 != 4

data_filtered = data[disjunction(c1,c2,c3)]

回答 2

最简单的解决方案:

用:

filtered_df = df[(df['col1'] >= 1) & (df['col1'] <= 5)]

另一个示例,要过滤数据框以查找属于Feb-2018的值,请使用以下代码

filtered_df = df[(df['year'] == 2018) & (df['month'] == 2)]

Simplest of All Solutions:

Use:

filtered_df = df[(df['col1'] >= 1) & (df['col1'] <= 5)]

Another Example, To filter the dataframe for values belonging to Feb-2018, use the below code

filtered_df = df[(df['year'] == 2018) & (df['month'] == 2)]

回答 3

由于pandas 0.22更新,提供了比较选项,例如:

  • gt(大于)
  • lt(小于)
  • eq(等于)
  • ne(不等于)
  • ge(大于或等于)

还有很多。这些函数返回布尔数组。让我们看看如何使用它们:

# sample data
df = pd.DataFrame({'col1': [0, 1, 2,3,4,5], 'col2': [10, 11, 12,13,14,15]})

# get values from col1 greater than or equals to 1
df.loc[df['col1'].ge(1),'col1']

1    1
2    2
3    3
4    4
5    5

# where co11 values is better 0 and 2
df.loc[df['col1'].between(0,2)]

 col1 col2
0   0   10
1   1   11
2   2   12

# where col1 > 1
df.loc[df['col1'].gt(1)]

 col1 col2
2   2   12
3   3   13
4   4   14
5   5   15

Since pandas 0.22 update, comparison options are available like:

  • gt (greater than)
  • lt (lesser than)
  • eq (equals to)
  • ne (not equals to)
  • ge (greater than or equals to)

and many more. These functions return boolean array. Let’s see how we can use them:

# sample data
df = pd.DataFrame({'col1': [0, 1, 2,3,4,5], 'col2': [10, 11, 12,13,14,15]})

# get values from col1 greater than or equals to 1
df.loc[df['col1'].ge(1),'col1']

1    1
2    2
3    3
4    4
5    5

# where co11 values is better 0 and 2
df.loc[df['col1'].between(0,2)]

 col1 col2
0   0   10
1   1   11
2   2   12

# where col1 > 1
df.loc[df['col1'].gt(1)]

 col1 col2
2   2   12
3   3   13
4   4   14
5   5   15

回答 4

为什么不这样做呢?

def filt_spec(df, col, val, op):
    import operator
    ops = {'eq': operator.eq, 'neq': operator.ne, 'gt': operator.gt, 'ge': operator.ge, 'lt': operator.lt, 'le': operator.le}
    return df[ops[op](df[col], val)]
pandas.DataFrame.filt_spec = filt_spec

演示:

df = pd.DataFrame({'a': [1,2,3,4,5], 'b':[5,4,3,2,1]})
df.filt_spec('a', 2, 'ge')

结果:

   a  b
 1  2  4
 2  3  3
 3  4  2
 4  5  1

您可以看到列“ a”已被过滤,其中> = 2。

这比操作员链接略快(键入时间,而不是性能)。您当然可以将导入文件放在文件顶部。

Why not do this?

def filt_spec(df, col, val, op):
    import operator
    ops = {'eq': operator.eq, 'neq': operator.ne, 'gt': operator.gt, 'ge': operator.ge, 'lt': operator.lt, 'le': operator.le}
    return df[ops[op](df[col], val)]
pandas.DataFrame.filt_spec = filt_spec

Demo:

df = pd.DataFrame({'a': [1,2,3,4,5], 'b':[5,4,3,2,1]})
df.filt_spec('a', 2, 'ge')

Result:

   a  b
 1  2  4
 2  3  3
 3  4  2
 4  5  1

You can see that column ‘a’ has been filtered where a >=2.

This is slightly faster (typing time, not performance) than operator chaining. You could of course put the import at the top of the file.


回答 5

e还可以基于不在列表中或不可迭代的列的值来选择行。我们将像以前一样创建布尔变量,但是现在我们将〜放在前面来否定布尔变量。

例如

list = [1, 0]
df[df.col1.isin(list)]

e can also select rows based on values of a column that are not in a list or any iterable. We will create boolean variable just like before, but now we will negate the boolean variable by placing ~ in the front.

For example

list = [1, 0]
df[df.col1.isin(list)]

如何检查两个列表在Python中是否循环相同

问题:如何检查两个列表在Python中是否循环相同

例如,我有以下列表:

a[0] = [1, 1, 1, 0, 0]
a[1] = [1, 1, 0, 0, 1]
a[2] = [0, 1, 1, 1, 0]
# and so on

它们似乎不同,但是如果假定起点和终点相连,那么它们在循环上是相同的。

问题是,每个列表的长度为55,并且仅包含三个1和52个零。如果没有循环条件,则有26,235(55选择3)个列表。但是,如果存在条件“循环”,则存在大量循环相同的列表

目前,我通过以下方式循环检查身份:

def is_dup(a, b):
    for i in range(len(a)):
        if a == list(numpy.roll(b, i)): # shift b circularly by i
            return True
    return False

在最坏的情况下,此功能需要55次循环移位操作。并且有26,235个列表可以相互比较。简而言之,我需要55 * 26,235 *(26,235-1)/ 2 = 18,926,847,225个计算。大约有20 Giga!

有什么好方法可以减少计算量吗?还是支持循环的任何数据类型?

For instance, I have lists:

a[0] = [1, 1, 1, 0, 0]
a[1] = [1, 1, 0, 0, 1]
a[2] = [0, 1, 1, 1, 0]
# and so on

They seem to be different, but if it is supposed that the start and the end are connected, then they are circularly identical.

The problem is, each list which I have has a length of 55 and contains only three ones and 52 zeros in it. Without circular condition, there are 26,235 (55 choose 3) lists. However, if the condition ‘circular’ exists, there are a huge number of circularly identical lists

Currently I check circularly identity by following:

def is_dup(a, b):
    for i in range(len(a)):
        if a == list(numpy.roll(b, i)): # shift b circularly by i
            return True
    return False

This function requires 55 cyclic shift operations at the worst case. And there are 26,235 lists to be compared with each other. In short, I need 55 * 26,235 * (26,235 – 1) / 2 = 18,926,847,225 computations. It’s about nearly 20 Giga!

Is there any good way to do it with less computations? Or any data types that supports circular?


回答 0

首先,这可以根据O(n)列表的长度来完成。您会注意到,如果将列表重复2次([1, 2, 3]),[1, 2, 3, 1, 2, 3]则新列表肯定会包含所有可能的循环列表。

因此,您所需要做的就是检查您要搜索的列表是否在起始列表的2倍之内。在python中,您可以通过以下方式实现此功能(假设长度相同)。

list1 = [1, 1, 1, 0, 0]
list2 = [1, 1, 0, 0, 1]
print ' '.join(map(str, list2)) in ' '.join(map(str, list1 * 2))

关于我的oneliner的一些解释: list * 2将列表与自身结合,map(str, [1, 2])将所有数字转换为字符串, ' '.join()并将数组['1', '2', '111']转换为字符串'1 2 111'

正如某些人在评论中指出的那样,oneliner可能会给出一些误报,因此涵盖了所有可能的极端情况:

def isCircular(arr1, arr2):
    if len(arr1) != len(arr2):
        return False

    str1 = ' '.join(map(str, arr1))
    str2 = ' '.join(map(str, arr2))
    if len(str1) != len(str2):
        return False

    return str1 in str2 + ' ' + str2

PS1在谈到时间复杂性时,值得注意的是,O(n)如果可以及时找到子字符串,则可以实现O(n)。它并非总是如此,并且取决于您所用语言的实现方式(尽管可能可以例如在线性时间KMP中完成)。

PS2针对那些害怕弦乐操作的人,由于这个事实,他们认为答案并不理想。重要的是复杂性和速度。该算法可能会在O(n)时间和O(n)空间上运行,这使其比O(n^2)领域内的任何算法都要好得多。要亲自查看,可以运行一个小的基准测试(创建一个随机列表会弹出第一个元素,并将其附加到末尾,从而创建一个循环列表。您可以自由地进行自己的操作)

from random import random
bigList = [int(1000 * random()) for i in xrange(10**6)]
bigList2 = bigList[:]
bigList2.append(bigList2.pop(0))

# then test how much time will it take to come up with an answer
from datetime import datetime
startTime = datetime.now()
print isCircular(bigList, bigList2)
print datetime.now() - startTime    # please fill free to use timeit, but it will give similar results

在我的机器上0.3秒。不太长。现在尝试将此与O(n^2)解决方案进行比较。在进行比较时,您可以从美国到澳大利亚旅行(很可能乘游轮旅行)

First off, this can be done in O(n) in terms of the length of the list You can notice that if you will duplicate your list 2 times ([1, 2, 3]) will be [1, 2, 3, 1, 2, 3] then your new list will definitely hold all possible cyclic lists.

So all you need is to check whether the list you are searching is inside a 2 times of your starting list. In python you can achieve this in the following way (assuming that the lengths are the same).

list1 = [1, 1, 1, 0, 0]
list2 = [1, 1, 0, 0, 1]
print ' '.join(map(str, list2)) in ' '.join(map(str, list1 * 2))

Some explanation about my oneliner: list * 2 will combine a list with itself, map(str, [1, 2]) convert all numbers to string and ' '.join() will convert array ['1', '2', '111'] into a string '1 2 111'.

As pointed by some people in the comments, oneliner can potentially give some false positives, so to cover all the possible edge cases:

def isCircular(arr1, arr2):
    if len(arr1) != len(arr2):
        return False

    str1 = ' '.join(map(str, arr1))
    str2 = ' '.join(map(str, arr2))
    if len(str1) != len(str2):
        return False

    return str1 in str2 + ' ' + str2

P.S.1 when speaking about time complexity, it is worth noticing that O(n) will be achieved if substring can be found in O(n) time. It is not always so and depends on the implementation in your language (although potentially it can be done in linear time KMP for example).

P.S.2 for people who are afraid strings operation and due to this fact think that the answer is not good. What important is complexity and speed. This algorithm potentially runs in O(n) time and O(n) space which makes it much better than anything in O(n^2) domain. To see this by yourself, you can run a small benchmark (creates a random list pops the first element and appends it to the end thus creating a cyclic list. You are free to do your own manipulations)

from random import random
bigList = [int(1000 * random()) for i in xrange(10**6)]
bigList2 = bigList[:]
bigList2.append(bigList2.pop(0))

# then test how much time will it take to come up with an answer
from datetime import datetime
startTime = datetime.now()
print isCircular(bigList, bigList2)
print datetime.now() - startTime    # please fill free to use timeit, but it will give similar results

0.3 seconds on my machine. Not really long. Now try to compare this with O(n^2) solutions. While it is comparing it, you can travel from US to Australia (most probably by a cruise ship)


回答 1

在Python中没有足够的知识来以您所请求的语言来回答这个问题,但是在C / C ++中,考虑到问题的参数,我会将零和一转换为位,然后将其压入uint64_t的最低有效位。这样一来,您就可以比较所有55位-1个时钟。

速度极快,整个过程都可以放入片上缓存(209,880字节)。仅在CPU的寄存器中提供对所有55个列表成员同时右移的硬件支持。同时比较所有55个成员时也是如此。这样可以将问题一对一地映射到软件解决方案。(并使用SIMD / SSE 256位寄存器,如果需要,最多可使用256个成员),因此,该代码对于读者而言立即显而易见。

您也许可以在Python中实现此功能,但我只是不太了解它,无法知道这是否可行或性能如何。

在它上睡觉之后,一些事情变得显而易见,并且一切都变得更好。

1.)使用位旋转循环链接列表非常容易,因此不必使用Dali的巧妙技巧。在64位寄存器内部,标准移位将非常简单地完成旋转,并通过使用算术而非位操作来使这一切对Python更友好。

2.)使用2分频可以轻松完成位移。

3.)通过模2可以很容易地检查列表末尾的0或1。

4.)通过从2的尾部将0从列表的尾部“移动”到列表的头部,这是因为,如果实际移动了0,则将使第55位为false,这已经是绝对不做的了。

5.)将a从尾移到列表的头可以通过除以2并加上18,014,398,509,481,984-这是通过将第55位标记为true并将其余所有标记为false来创建的值。

6.)如果在任何给定的旋转之后,锚点与组合uint64_t的比较为TRUE,则中断并返回TRUE。

我会将整个列表数组直接转换为uint64_ts数组,以避免重复进行转换。

在花了几个小时尝试优化代码之后,研究了汇编语言,我能够节省20%的运行时间。我还应该补充说,O / S和MSVC编译器也在昨天中午更新。无论出于何种原因,C编译器生成的代码质量在更新(11/15/2014)后都得到了显着改善。现在,运行时间约为70个时钟,约17纳秒,可在12.5秒内完成并比较锚环与测试环的所有55圈,并将所有环的NxN与其他环进行比较。

这段代码是如此紧凑,除了4个寄存器外,其余的99%的时间都无所作为。汇编语言几乎逐行匹配C代码。非常容易阅读和理解。如果有人自学的话,这是一个很棒的组装项目。

硬件是Hazwell i7,MSVC 64位,全面优化。

#include "stdafx.h"
#include "stdafx.h"
#include <string>
#include <memory>
#include <stdio.h>
#include <time.h>

const uint8_t  LIST_LENGTH = 55;    // uint_8 supports full witdth of SIMD and AVX2
// max left shifts is 32, so must use right shifts to create head_bit
const uint64_t head_bit = (0x8000000000000000 >> (64 - LIST_LENGTH)); 
const uint64_t CPU_FREQ = 3840000000;   // turbo-mode clock freq of my i7 chip

const uint64_t LOOP_KNT = 688275225; // 26235^2 // 1000000000;

// ----------------------------------------------------------------------------
__inline uint8_t is_circular_identical(const uint64_t anchor_ring, uint64_t test_ring)
{
    // By trial and error, try to synch 2 circular lists by holding one constant
    //   and turning the other 0 to LIST_LENGTH positions. Return compare count.

    // Return the number of tries which aligned the circularly identical rings, 
    //  where any non-zero value is treated as a bool TRUE. Return a zero/FALSE,
    //  if all tries failed to find a sequence match. 
    // If anchor_ring and test_ring are equal to start with, return one.

    for (uint8_t i = LIST_LENGTH; i;  i--)
    {
        // This function could be made bool, returning TRUE or FALSE, but
        // as a debugging tool, knowing the try_knt that got a match is nice.
        if (anchor_ring == test_ring) {  // test all 55 list members simultaneously
            return (LIST_LENGTH +1) - i;
        }

        if (test_ring % 2) {    //  ring's tail is 1 ?
            test_ring /= 2;     //  right-shift 1 bit
            // if the ring tail was 1, set head to 1 to simulate wrapping
            test_ring += head_bit;      
        }   else    {           // ring's tail must be 0
            test_ring /= 2;     // right-shift 1 bit
            // if the ring tail was 0, doing nothing leaves head a 0
        }
    }
    // if we got here, they can't be circularly identical
    return 0;
}
// ----------------------------------------------------------------------------
    int main(void)  {
        time_t start = clock();
        uint64_t anchor, test_ring, i,  milliseconds;
        uint8_t try_knt;

        anchor = 31525197391593472; // bits 55,54,53 set true, all others false
        // Anchor right-shifted LIST_LENGTH/2 represents the average search turns
        test_ring = anchor >> (1 + (LIST_LENGTH / 2)); //  117440512; 

        printf("\n\nRunning benchmarks for %llu loops.", LOOP_KNT);
        start = clock();
        for (i = LOOP_KNT; i; i--)  {
            try_knt = is_circular_identical(anchor, test_ring);
            // The shifting of test_ring below is a test fixture to prevent the 
            //  optimizer from optimizing the loop away and returning instantly
            if (i % 2) {
                test_ring /= 2;
            }   else  {
                test_ring *= 2;
            }
        }
        milliseconds = (uint64_t)(clock() - start);
        printf("\nET for is_circular_identical was %f milliseconds."
                "\n\tLast try_knt was %u for test_ring list %llu", 
                        (double)milliseconds, try_knt, test_ring);

        printf("\nConsuming %7.1f clocks per list.\n",
                (double)((milliseconds * (CPU_FREQ / 1000)) / (uint64_t)LOOP_KNT));

        getchar();
        return 0;
}

在此处输入图片说明

Not knowledgeable enough in Python to answer this in your requested language, but in C/C++, given the parameters of your question, I’d convert the zeros and ones to bits and push them onto the least significant bits of an uint64_t. This will allow you to compare all 55 bits in one fell swoop – 1 clock.

Wickedly fast, and the whole thing will fit in on-chip caches (209,880 bytes). Hardware support for shifting all 55 list members right simultaneously is available only in a CPU’s registers. The same goes for comparing all 55 members simultaneously. This allows for a 1-for-1 mapping of the problem to a software solution. (and using the SIMD/SSE 256 bit registers, up to 256 members if needed) As a result the code is immediately obvious to the reader.

You might be able to implement this in Python, I just don’t know it well enough to know if that’s possible or what the performance might be.

After sleeping on it a few things became obvious, and all for the better.

1.) It’s so easy to spin the circularly linked list using bits that Dali’s very clever trick isn’t necessary. Inside a 64-bit register standard bit shifting will accomplish the rotation very simply, and in an attempt to make this all more Python friendly, by using arithmetic instead of bit ops.

2.) Bit shifting can be accomplished easily using divide by 2.

3.) Checking the end of the list for 0 or 1 can be easily done by modulo 2.

4.) “Moving” a 0 to the head of the list from the tail can be done by dividing by 2. This because if the zero were actually moved it would make the 55th bit false, which it already is by doing absolutely nothing.

5.) “Moving” a 1 to the head of the list from the tail can be done by dividing by 2 and adding 18,014,398,509,481,984 – which is the value created by marking the 55th bit true and all the rest false.

6.) If a comparison of the anchor and composed uint64_t is TRUE after any given rotation, break and return TRUE.

I would convert the entire array of lists into an array of uint64_ts right up front to avoid having to do the conversion repeatedly.

After spending a few hours trying to optimize the code, studying the assembly language I was able to shave 20% off the runtime. I should add that the O/S and MSVC compiler got updated mid-day yesterday as well. For whatever reason/s, the quality of the code the C compiler produced improved dramatically after the update (11/15/2014). Run-time is now ~ 70 clocks, 17 nanoseconds to compose and compare an anchor ring with all 55 turns of a test ring and NxN of all rings against all others is done in 12.5 seconds.

This code is so tight all but 4 registers are sitting around doing nothing 99% of the time. The assembly language matches the C code almost line for line. Very easy to read and understand. A great assembly project if someone were teaching themselves that.

Hardware is Hazwell i7, MSVC 64-bit, full optimizations.

#include "stdafx.h"
#include "stdafx.h"
#include <string>
#include <memory>
#include <stdio.h>
#include <time.h>

const uint8_t  LIST_LENGTH = 55;    // uint_8 supports full witdth of SIMD and AVX2
// max left shifts is 32, so must use right shifts to create head_bit
const uint64_t head_bit = (0x8000000000000000 >> (64 - LIST_LENGTH)); 
const uint64_t CPU_FREQ = 3840000000;   // turbo-mode clock freq of my i7 chip

const uint64_t LOOP_KNT = 688275225; // 26235^2 // 1000000000;

// ----------------------------------------------------------------------------
__inline uint8_t is_circular_identical(const uint64_t anchor_ring, uint64_t test_ring)
{
    // By trial and error, try to synch 2 circular lists by holding one constant
    //   and turning the other 0 to LIST_LENGTH positions. Return compare count.

    // Return the number of tries which aligned the circularly identical rings, 
    //  where any non-zero value is treated as a bool TRUE. Return a zero/FALSE,
    //  if all tries failed to find a sequence match. 
    // If anchor_ring and test_ring are equal to start with, return one.

    for (uint8_t i = LIST_LENGTH; i;  i--)
    {
        // This function could be made bool, returning TRUE or FALSE, but
        // as a debugging tool, knowing the try_knt that got a match is nice.
        if (anchor_ring == test_ring) {  // test all 55 list members simultaneously
            return (LIST_LENGTH +1) - i;
        }

        if (test_ring % 2) {    //  ring's tail is 1 ?
            test_ring /= 2;     //  right-shift 1 bit
            // if the ring tail was 1, set head to 1 to simulate wrapping
            test_ring += head_bit;      
        }   else    {           // ring's tail must be 0
            test_ring /= 2;     // right-shift 1 bit
            // if the ring tail was 0, doing nothing leaves head a 0
        }
    }
    // if we got here, they can't be circularly identical
    return 0;
}
// ----------------------------------------------------------------------------
    int main(void)  {
        time_t start = clock();
        uint64_t anchor, test_ring, i,  milliseconds;
        uint8_t try_knt;

        anchor = 31525197391593472; // bits 55,54,53 set true, all others false
        // Anchor right-shifted LIST_LENGTH/2 represents the average search turns
        test_ring = anchor >> (1 + (LIST_LENGTH / 2)); //  117440512; 

        printf("\n\nRunning benchmarks for %llu loops.", LOOP_KNT);
        start = clock();
        for (i = LOOP_KNT; i; i--)  {
            try_knt = is_circular_identical(anchor, test_ring);
            // The shifting of test_ring below is a test fixture to prevent the 
            //  optimizer from optimizing the loop away and returning instantly
            if (i % 2) {
                test_ring /= 2;
            }   else  {
                test_ring *= 2;
            }
        }
        milliseconds = (uint64_t)(clock() - start);
        printf("\nET for is_circular_identical was %f milliseconds."
                "\n\tLast try_knt was %u for test_ring list %llu", 
                        (double)milliseconds, try_knt, test_ring);

        printf("\nConsuming %7.1f clocks per list.\n",
                (double)((milliseconds * (CPU_FREQ / 1000)) / (uint64_t)LOOP_KNT));

        getchar();
        return 0;
}

enter image description here


回答 2

在两行之间阅读时,听起来好像您正在尝试枚举具有3个1和52个0的字符串的每个圆形等效类的代表。让我们从一个密集的表示转换为一个稀疏的表示(中的三个数字的集合range(55))。在此表示形式中,sby 的循环移位k由理解力给出set((i + k) % 55 for i in s)。类中的词典最小代表通常包含位置0。给定一组形式{0, i, j}为的0 < i < j,该类中其他最小候选者为{0, j - i, 55 - i}{0, 55 - j, 55 + i - j}。因此,我们需要(i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j))使原件最小。这是一些枚举代码。

def makereps():
    reps = []
    for i in range(1, 55 - 1):
        for j in range(i + 1, 55):
            if (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)):
                reps.append('1' + '0' * (i - 1) + '1' + '0' * (j - i - 1) + '1' + '0' * (55 - j - 1))
    return reps

Reading between the lines, it sounds as though you’re trying to enumerate one representative of each circular equivalence class of strings with 3 ones and 52 zeros. Let’s switch from a dense representation to a sparse one (set of three numbers in range(55)). In this representation, the circular shift of s by k is given by the comprehension set((i + k) % 55 for i in s). The lexicographic minimum representative in a class always contains the position 0. Given a set of the form {0, i, j} with 0 < i < j, the other candidates for minimum in the class are {0, j - i, 55 - i} and {0, 55 - j, 55 + i - j}. Hence, we need (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)) for the original to be minimum. Here’s some enumeration code.

def makereps():
    reps = []
    for i in range(1, 55 - 1):
        for j in range(i + 1, 55):
            if (i, j) <= min((j - i, 55 - i), (55 - j, 55 + i - j)):
                reps.append('1' + '0' * (i - 1) + '1' + '0' * (j - i - 1) + '1' + '0' * (55 - j - 1))
    return reps

回答 3

重复第一个数组,然后使用Z算法(O(n)时间)在第一个数组中找到第二个数组。

(注意:您不必物理复制第一个数组。只需在匹配过程中进行环绕即可。)

Z算法的优点是,与KMP,BM等相比,它非常简单。
但是,如果您有雄心壮志,则可以在线性时间和恒定空间中进行字符串匹配strstr,例如,这样做。但是,实施它会更加痛苦。

Repeat the first array, then use the Z algorithm (O(n) time) to find the second array inside the first.

(Note: you don’t have to physically copy the first array. You can just wrap around during matching.)

The nice thing about the Z algorithm is that it’s very simple compared to KMP, BM, etc.
However, if you’re feeling ambitious, you could do string matching in linear time and constant space — strstr, for example, does this. Implementing it would be more painful, though.


回答 4

紧随Salvador Dali的非常聪明的解决方案之后,处理该问题的最佳方法是确保所有元素的长度相同,并且两个LIST的长度相同。

def is_circular_equal(lst1, lst2):
    if len(lst1) != len(lst2):
        return False
    lst1, lst2 = map(str, lst1), map(str, lst2)
    len_longest_element = max(map(len, lst1))
    template = "{{:{}}}".format(len_longest_element)
    circ_lst = " ".join([template.format(el) for el in lst1]) * 2
    return " ".join([template.format(el) for el in lst2]) in circ_lst

不知道这是快于或慢于萨尔瓦多·达利的答案中AshwiniChaudhary建议的正则表达式解决方案的内容,该内容为:

import re

def is_circular_equal(lst1, lst2):
    if len(lst2) != len(lst2):
        return False
    return bool(re.search(r"\b{}\b".format(' '.join(map(str, lst2))),
                          ' '.join(map(str, lst1)) * 2))

Following up on Salvador Dali’s very smart solution, the best way to handle it is to make sure all elements are of the same length, as well as both LISTS are of the same length.

def is_circular_equal(lst1, lst2):
    if len(lst1) != len(lst2):
        return False
    lst1, lst2 = map(str, lst1), map(str, lst2)
    len_longest_element = max(map(len, lst1))
    template = "{{:{}}}".format(len_longest_element)
    circ_lst = " ".join([template.format(el) for el in lst1]) * 2
    return " ".join([template.format(el) for el in lst2]) in circ_lst

No clue if this is faster or slower than AshwiniChaudhary’s recommended regex solution in Salvador Dali’s answer, which reads:

import re

def is_circular_equal(lst1, lst2):
    if len(lst2) != len(lst2):
        return False
    return bool(re.search(r"\b{}\b".format(' '.join(map(str, lst2))),
                          ' '.join(map(str, lst1)) * 2))

回答 5

假设您需要进行大量比较,那么在初次遍历列表时将它们转换为某种可以轻松比较的规范形式是否值得?

您是否要获取一组圆形唯一列表?如果是这样,您可以在转换为元组后将它们放入集合中。

def normalise(lst):
    # Pick the 'maximum' out of all cyclic options
    return max([lst[i:]+lst[:i] for i in range(len(lst))])

a_normalised = map(normalise,a)
a_tuples = map(tuple,a_normalised)
a_unique = set(a_tuples)

大卫·艾森斯塔德(David Eisenstat)未能对他的相似答案表示歉意。

Given that you need to do so many comparisons might it be worth your while taking an initial pass through your lists to convert them into some sort of canonical form that can be easily compared?

Are you trying to get a set of circularly-unique lists? If so you can throw them into a set after converting to tuples.

def normalise(lst):
    # Pick the 'maximum' out of all cyclic options
    return max([lst[i:]+lst[:i] for i in range(len(lst))])

a_normalised = map(normalise,a)
a_tuples = map(tuple,a_normalised)
a_unique = set(a_tuples)

Apologies to David Eisenstat for not spotting his v.similar answer.


回答 6

您可以像这样滚动一个列表:

list1, list2 = [0,1,1,1,0,0,1,0], [1,0,0,1,0,0,1,1]

str_list1="".join(map(str,list1))
str_list2="".join(map(str,list2))

def rotate(string_to_rotate, result=[]):
    result.append(string_to_rotate)
    for i in xrange(1,len(string_to_rotate)):
        result.append(result[-1][1:]+result[-1][0])
    return result

for x in rotate(str_list1):
    if cmp(x,str_list2)==0:
        print "lists are rotationally identical"
        break

You can roll one list like this:

list1, list2 = [0,1,1,1,0,0,1,0], [1,0,0,1,0,0,1,1]

str_list1="".join(map(str,list1))
str_list2="".join(map(str,list2))

def rotate(string_to_rotate, result=[]):
    result.append(string_to_rotate)
    for i in xrange(1,len(string_to_rotate)):
        result.append(result[-1][1:]+result[-1][0])
    return result

for x in rotate(str_list1):
    if cmp(x,str_list2)==0:
        print "lists are rotationally identical"
        break

回答 7

首先将每一个列表元素(在副本如有必要),以旋转的版本,是词法最大。

然后对列表的结果列表进行排序(在原始列表位置保留索引)并统一排序后的列表,并根据需要在原始列表中标记所有重复项。

First convert every of your list elements (in a copy if necessary) to that rotated version that is lexically greatest.

Then sort the resulting list of lists (retaining an index into the original list position) and unify the sorted list, marking all the duplicates in the original list as needed.


回答 8

Sa带@SalvadorDali观察在b + b中任何a长度大小的切片中寻找a的匹配项的观察,这是仅使用列表操作的解决方案。

def rollmatch(a,b):
    bb=b*2
    return any(not any(ax^bbx for ax,bbx in zip(a,bb[i:])) for i in range(len(a)))

l1 = [1,0,0,1]
l2 = [1,1,0,0]
l3 = [1,0,1,0]

rollmatch(l1,l2)  # True
rollmatch(l1,l3)  # False

第二种方法:[已删除]

Piggybacking on @SalvadorDali’s observation on looking for matches of a in any a-lengthed sized slice in b+b, here is a solution using just list operations.

def rollmatch(a,b):
    bb=b*2
    return any(not any(ax^bbx for ax,bbx in zip(a,bb[i:])) for i in range(len(a)))

l1 = [1,0,0,1]
l2 = [1,1,0,0]
l3 = [1,0,1,0]

rollmatch(l1,l2)  # True
rollmatch(l1,l3)  # False

2nd approach: [deleted]


回答 9

这不是一个完整的,独立的答案,但是在通过减少比较来优化的主题上,我也正在考虑归一化的表示形式。

即,如果您输入的字母为{0,1},则可以大大减少允许的排列数量。将第一个列表旋转为(伪)归一化形式(考虑到问题中的分布,我将选择其中一个1位之一位于最左端,而0个位之一位于最右端的一个)。现在,在每次比较之前,先以相同的对齐方式在可能的位置上依次旋转其他列表。

例如,如果您总共有四个1位,则此对齐方式最多可以有4个排列,并且如果您具有相邻1位的簇,则此簇中的每个附加位都会减少位置数。

List 1   1 1 1 0 1 0

List 2   1 0 1 1 1 0  1st permutation
         1 1 1 0 1 0  2nd permutation, final permutation, match, done

这可以概括为较大的字母和不同的对齐方式。主要的挑战是找到仅包含少量可能表示的良好规范化。理想情况下,这将是具有单个唯一表示形式的适当规范化,但是鉴于问题,我认为这是不可能的。

Not a complete, free-standing answer, but on the topic of optimizing by reducing comparisons, I too was thinking of normalized representations.

Namely, if your input alphabet is {0, 1}, you could reduce the number of allowed permutations significantly. Rotate the first list to a (pseudo-) normalized form (given the distribution in your question, I would pick one where one of the 1 bits is on the extreme left, and one of the 0 bits is on the extreme right). Now before each comparison, successively rotate the other list through the possible positions with the same alignment pattern.

For example, if you have a total of four 1 bits, there can be at most 4 permutations with this alignment, and if you have clusters of adjacent 1 bits, each additional bit in such a cluster reduces the amount of positions.

List 1   1 1 1 0 1 0

List 2   1 0 1 1 1 0  1st permutation
         1 1 1 0 1 0  2nd permutation, final permutation, match, done

This generalizes to larger alphabets and different alignment patterns; the main challenge is to find a good normalization with only a few possible representations. Ideally, it would be a proper normalization, with a single unique representation, but given the problem, I don’t think that’s possible.


回答 10

进一步建立在RocketRoy的答案上:将所有列表预先转换为无符号的64位数字。对于每个列表,将其旋转55位以找到最小的数值。

现在,每个列表都剩下一个无符号的64位值,您可以直接将其与其他列表的值进行比较。不再需要函数is_circular_identical()。

(从本质上讲,您为列表创建了一个不受列表元素轮换影响的标识值)如果列表中有任意多个ID,则该值甚至可以工作。

Building further on RocketRoy’s answer: Convert all your lists up front to unsigned 64 bit numbers. For each list, rotate those 55 bits around to find the smallest numerical value.

You are now left with a single unsigned 64 bit value for each list that you can compare straight with the value of the other lists. Function is_circular_identical() is not required anymore.

(In essence, you create an identity value for your lists that is not affected by the rotation of the lists elements) That would even work if you have an arbitrary number of one’s in your lists.


回答 11

这与Salvador Dali的想法相同,但不需要字符串转换。后面是相同的KMP恢复想法,以避免不可能的轮班检查。他们仅调用KMPModified(list1,list2 + list2)。

    public class KmpModified
    {
        public int[] CalculatePhi(int[] pattern)
        {
            var phi = new int[pattern.Length + 1];
            phi[0] = -1;
            phi[1] = 0;

            int pos = 1, cnd = 0;
            while (pos < pattern.Length)
                if (pattern[pos] == pattern[cnd])
                {
                    cnd++;
                    phi[pos + 1] = cnd;
                    pos++;
                }
                else if (cnd > 0)
                    cnd = phi[cnd];
                else
                {
                    phi[pos + 1] = 0;
                    pos++;
                }

            return phi;
        }

        public IEnumerable<int> Search(int[] pattern, int[] list)
        {
            var phi = CalculatePhi(pattern);

            int m = 0, i = 0;
            while (m < list.Length)
                if (pattern[i] == list[m])
                {
                    i++;
                    if (i == pattern.Length)
                    {
                        yield return m - i + 1;
                        i = phi[i];
                    }
                    m++;
                }
                else if (i > 0)
                {
                    i = phi[i];
                }
                else
                {
                    i = 0;
                    m++;
                }
        }

        [Fact]
        public void BasicTest()
        {
            var pattern = new[] { 1, 1, 10 };
            var list = new[] {2, 4, 1, 1, 1, 10, 1, 5, 1, 1, 10, 9};
            var matches = Search(pattern, list).ToList();

            Assert.Equal(new[] {3, 8}, matches);
        }

        [Fact]
        public void SolveProblem()
        {
            var random = new Random();
            var list = new int[10];
            for (var k = 0; k < list.Length; k++)
                list[k]= random.Next();

            var rotation = new int[list.Length];
            for (var k = 1; k < list.Length; k++)
                rotation[k - 1] = list[k];
            rotation[rotation.Length - 1] = list[0];

            Assert.True(Search(list, rotation.Concat(rotation).ToArray()).Any());
        }
    }

希望对您有所帮助!

This is the same idea of Salvador Dali but don’t need the string convertion. Behind is the same KMP recover idea to avoid impossible shift inspection. Them only call KMPModified(list1, list2+list2).

    public class KmpModified
    {
        public int[] CalculatePhi(int[] pattern)
        {
            var phi = new int[pattern.Length + 1];
            phi[0] = -1;
            phi[1] = 0;

            int pos = 1, cnd = 0;
            while (pos < pattern.Length)
                if (pattern[pos] == pattern[cnd])
                {
                    cnd++;
                    phi[pos + 1] = cnd;
                    pos++;
                }
                else if (cnd > 0)
                    cnd = phi[cnd];
                else
                {
                    phi[pos + 1] = 0;
                    pos++;
                }

            return phi;
        }

        public IEnumerable<int> Search(int[] pattern, int[] list)
        {
            var phi = CalculatePhi(pattern);

            int m = 0, i = 0;
            while (m < list.Length)
                if (pattern[i] == list[m])
                {
                    i++;
                    if (i == pattern.Length)
                    {
                        yield return m - i + 1;
                        i = phi[i];
                    }
                    m++;
                }
                else if (i > 0)
                {
                    i = phi[i];
                }
                else
                {
                    i = 0;
                    m++;
                }
        }

        [Fact]
        public void BasicTest()
        {
            var pattern = new[] { 1, 1, 10 };
            var list = new[] {2, 4, 1, 1, 1, 10, 1, 5, 1, 1, 10, 9};
            var matches = Search(pattern, list).ToList();

            Assert.Equal(new[] {3, 8}, matches);
        }

        [Fact]
        public void SolveProblem()
        {
            var random = new Random();
            var list = new int[10];
            for (var k = 0; k < list.Length; k++)
                list[k]= random.Next();

            var rotation = new int[list.Length];
            for (var k = 1; k < list.Length; k++)
                rotation[k - 1] = list[k];
            rotation[rotation.Length - 1] = list[0];

            Assert.True(Search(list, rotation.Concat(rotation).ToArray()).Any());
        }
    }

Hope this help!


回答 12

简化问题

  • 问题包括订购商品清单
  • 价值域是二进制 (0,1)
  • 我们可以通过将连续的1s 映射到count中来减少问题
  • 和连续0s为负数

A = [ 1, 1, 1, 0, 0, 1, 1, 0 ]
B = [ 1, 1, 0, 1, 1, 1, 0, 0 ]
~
A = [ +3, -2, +2, -1 ]
B = [ +2, -1, +3, -2 ]
  • 此过程要求第一个项目和最后一个项目必须不同
  • 这将减少总体比较量

检查流程

  • 如果我们假设它们是重复的,那么我们可以假设我们正在寻找什么
  • 基本上,第一个列表中的第一个项目必须存在于另一个列表中的某个位置
  • 其次是第一个列表中的后续内容,并且以相同的方式
  • 前面的项目应该是第一个列表中的最后一个项目
  • 由于是圆形的,因此顺序相同

握力

  • 这里的问题是从哪里开始,技术上称为lookuplook-ahead
  • 我们将检查第二个列表中第一个列表中第一个元素的位置
  • 考虑到我们将列表映射到直方图中,频繁元素的概率较低

伪代码

FUNCTION IS_DUPLICATE (LIST L1, LIST L2) : BOOLEAN

    LIST A = MAP_LIST(L1)
    LIST B = MAP_LIST(L2)

    LIST ALPHA = LOOKUP_INDEX(B, A[0])

    IF A.SIZE != B.SIZE
       OR COUNT_CHAR(A, 0) != COUNT_CHAR(B, ALPHA[0]) THEN

        RETURN FALSE

    END IF

    FOR EACH INDEX IN ALPHA

        IF ALPHA_NGRAM(A, B, INDEX, 1) THEN

            IF IS_DUPLICATE(A, B, INDEX) THEN

                RETURN TRUE

            END IF

        END IF

    END FOR

    RETURN FALSE

END FUNCTION

FUNCTION IS_DUPLICATE (LIST L1, LIST L2, INTEGER INDEX) : BOOLEAN

    INTEGER I = 0

    WHILE I < L1.SIZE DO

        IF L1[I] != L2[(INDEX+I)%L2.SIZE] THEN

            RETURN FALSE

        END IF

        I = I + 1

    END WHILE

    RETURN TRUE

END FUNCTION

功能

  • MAP_LIST(LIST A):LIST MAP决定性元素在新列表中的数量

  • LOOKUP_INDEX(LIST A, INTEGER E):LISTE在列表中存在元素的地方返回索引列表A

  • COUNT_CHAR(LIST A , INTEGER E):INTEGER计数E列表中元素发生的次数A

  • ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN检查是否B[I]等同于A[0] N-GRAM对两个方向均


最后

如果列表大小将非常庞大,或者如果我们开始检查周期的元素经常很高,则可以执行以下操作:

  • 在第一个列表中查找频率最低的项目以

  • 增加n-gram N参数以降低通过线性检查的可能性

Simplifying The Problem

  • The problem consist of list of ordered items
  • The domain of value is binary (0,1)
  • We can reduce the problem by mapping consecutive 1s into a count
  • and consecutive 0s into a negative count

Example

A = [ 1, 1, 1, 0, 0, 1, 1, 0 ]
B = [ 1, 1, 0, 1, 1, 1, 0, 0 ]
~
A = [ +3, -2, +2, -1 ]
B = [ +2, -1, +3, -2 ]
  • This process require that the first item and the last item must be different
  • This will reduce the amount of comparisons overall

Checking Process

  • If we assume that they’re duplicate, then we can assume what we are looking for
  • Basically the first item from the first list must exist somewhere in the other list
  • Followed by what is followed in the first list, and in the same manner
  • The previous items should be the last items from the first list
  • Since it’s circular, the order is the same

The Grip

  • The question here is where to start, technically known as lookup and look-ahead
  • We will just check where the first element of the first list exist through the second list
  • The probability of frequent element is lower given that we mapped the lists into histograms

Pseudo-Code

FUNCTION IS_DUPLICATE (LIST L1, LIST L2) : BOOLEAN

    LIST A = MAP_LIST(L1)
    LIST B = MAP_LIST(L2)

    LIST ALPHA = LOOKUP_INDEX(B, A[0])

    IF A.SIZE != B.SIZE
       OR COUNT_CHAR(A, 0) != COUNT_CHAR(B, ALPHA[0]) THEN

        RETURN FALSE

    END IF

    FOR EACH INDEX IN ALPHA

        IF ALPHA_NGRAM(A, B, INDEX, 1) THEN

            IF IS_DUPLICATE(A, B, INDEX) THEN

                RETURN TRUE

            END IF

        END IF

    END FOR

    RETURN FALSE

END FUNCTION

FUNCTION IS_DUPLICATE (LIST L1, LIST L2, INTEGER INDEX) : BOOLEAN

    INTEGER I = 0

    WHILE I < L1.SIZE DO

        IF L1[I] != L2[(INDEX+I)%L2.SIZE] THEN

            RETURN FALSE

        END IF

        I = I + 1

    END WHILE

    RETURN TRUE

END FUNCTION

Functions

  • MAP_LIST(LIST A):LIST MAP CONSQUETIVE ELEMENTS AS COUNTS IN A NEW LIST

  • LOOKUP_INDEX(LIST A, INTEGER E):LIST RETURN LIST OF INDICES WHERE THE ELEMENT E EXIST IN THE LIST A

  • COUNT_CHAR(LIST A , INTEGER E):INTEGER COUNT HOW MANY TIMES AN ELEMENT E OCCUR IN A LIST A

  • ALPHA_NGRAM(LIST A,LIST B,INTEGER I,INTEGER N):BOOLEAN CHECK IF B[I] IS EQUIVALENT TO A[0] N-GRAM IN BOTH DIRECTIONS


Finally

If the list size is going to be pretty huge or if the element we are starting to check the cycle from is frequently high, then we can do the following:

  • Look for the least-frequent item in the first list to start with

  • increase the n-gram N parameter to lower the probability of going through a the linear check


回答 13

可以将有关列表的有效,快速计算的“规范形式”导出为:

  • 计算一个之间的零数目(忽略环绕),以获得三个数字。
  • 旋转三个数字,以便最大的数字位于第一位。
  • 第一个数字(a)必须在18和之间52(包括)。重新编码为0和之间34
  • 第二个数字(b)必须在0和之间26,但没关系。
  • 删除第三个数字,因为它只是一个52 - (a + b),不添加任何信息

规范形式是整数b * 35 + a,介于0和之间936(包括在内),并且非常紧凑(477总共有圆唯一列表)。

An efficient, quick-to-compute “canonical form” for the lists in question can be derived as:

  • Count the number of zeroes between the ones (ignoring wrap-around), to get three numbers.
  • Rotate the three numbers so that the biggest number is first.
  • The first number (a) must be between 18 and 52 (inclusive). Re-encode it as between 0 and 34.
  • The second number (b) must be between 0 and 26, but it doesn’t matter much.
  • Drop the third number, since it’s just 52 - (a + b) and adds no information

The canonical form is the integer b * 35 + a, which is between 0 and 936 (inclusive), which is fairly compact (there are 477 circularly-unique lists in total).


回答 14

我写了一个简单的解决方案,它比较两个列表,并为每次迭代增加(并包装)比较值的索引。

我不太了解python,所以我用Java编写了它,但是它非常简单,因此应该很容易将其适应任何其他语言。

这样,您还可以比较其他类型的列表。

public class Main {

    public static void main(String[] args){
        int[] a = {0,1,1,1,0};
        int[] b = {1,1,0,0,1};

        System.out.println(isCircularIdentical(a, b));
    }

    public static boolean isCircularIdentical(int[] a, int[]b){
        if(a.length != b.length){
            return false;
        }

        //The outer loop is for the increase of the index of the second list
        outer:
        for(int i = 0; i < a.length; i++){
            //Loop trough the list and compare each value to the according value of the second list
            for(int k = 0; k < a.length; k++){
                // I use modulo length to wrap around the index
                if(a[k] != b[(k + i) % a.length]){
                    //If the values do not match I continue and shift the index one further
                    continue outer;
                }
            }
            return true;
        }
        return false;
    }
}

I wrote an straightforward solution which compares both lists and just increases (and wraps around) the index of the compared value for each iteration.

I don’t know python well so I wrote it in Java, but it’s really simple so it should be easy to adapt it to any other language.

By this you could also compare lists of other types.

public class Main {

    public static void main(String[] args){
        int[] a = {0,1,1,1,0};
        int[] b = {1,1,0,0,1};

        System.out.println(isCircularIdentical(a, b));
    }

    public static boolean isCircularIdentical(int[] a, int[]b){
        if(a.length != b.length){
            return false;
        }

        //The outer loop is for the increase of the index of the second list
        outer:
        for(int i = 0; i < a.length; i++){
            //Loop trough the list and compare each value to the according value of the second list
            for(int k = 0; k < a.length; k++){
                // I use modulo length to wrap around the index
                if(a[k] != b[(k + i) % a.length]){
                    //If the values do not match I continue and shift the index one further
                    continue outer;
                }
            }
            return true;
        }
        return false;
    }
}

回答 15

正如其他人提到的那样,一旦找到列表的标准化轮换,就可以对其进行比较。

这是执行此操作的一些工作代码,基本方法是为每个列表找到标准化的轮换并进行比较:

  • 在每个列表上计算归一化的旋转索引。
  • 循环使用两个列表及其偏移量,比较每个项目,如果它们不匹配则返回。

请注意,此方法不依赖于数字,您可以传入字符串列表(可以比较的任何值)。

我们知道我们希望列表以最小值开头,而不是进行列表中列表搜索,因此我们可以遍历最小值,进行搜索直到找到哪个具有最低连续值,然后将其存储以进行进一步比较直到我们拥有最好的。

计算索引时有很多机会提早退出,有关一些优化的详细信息。

  • 如果只有一个,则跳过搜索最佳最小值。
  • 当前一个也是最小值时,跳过搜索最小值(永远不会是更好的匹配项)。
  • 当所有值都相同时,跳过搜索。
  • 列表具有不同的最小值时,较早失败。
  • 偏移量匹配时使用常规比较。
  • 调整偏移量以避免在比较期间将索引值包装在列表之一上。

请注意,在Python中,列表中搜索可能会更快,但是我很想找到一种有效的算法-也可以在其他语言中使用该算法。同样,避免创建新列表也有一些优势。

def normalize_rotation_index(ls, v_min_other=None):
    """ Return the index or -1 (when the minimum is above `v_min_other`) """

    if len(ls) <= 1:
        return 0

    def compare_rotations(i_a, i_b):
        """ Return True when i_a is smaller.
            Note: unless there are large duplicate sections of identical values,
            this loop will exit early on.
        """
        for offset in range(1, len(ls)):
            v_a = ls[(i_a + offset) % len(ls)]
            v_b = ls[(i_b + offset) % len(ls)]
            if v_a < v_b:
                return True
            elif v_a > v_b:
                return False
        return False

    v_min = ls[0]
    i_best_first = 0
    i_best_last = 0
    i_best_total = 1
    for i in range(1, len(ls)):
        v = ls[i]
        if v_min > v:
            v_min = v
            i_best_first = i
            i_best_last = i
            i_best_total = 1
        elif v_min == v:
            i_best_last = i
            i_best_total += 1

    # all values match
    if i_best_total == len(ls):
        return 0

    # exit early if we're not matching another lists minimum
    if v_min_other is not None:
        if v_min != v_min_other:
            return -1
    # simple case, only one minimum
    if i_best_first == i_best_last:
        return i_best_first

    # otherwise find the minimum with the lowest values compared to all others.
    # start looking after the first we've found
    i_best = i_best_first
    for i in range(i_best_first + 1, i_best_last + 1):
        if (ls[i] == v_min) and (ls[i - 1] != v_min):
            if compare_rotations(i, i_best):
                i_best = i

    return i_best


def compare_circular_lists(ls_a, ls_b):
    # sanity checks
    if len(ls_a) != len(ls_b):
        return False
    if len(ls_a) <= 1:
        return (ls_a == ls_b)

    index_a = normalize_rotation_index(ls_a)
    index_b = normalize_rotation_index(ls_b, ls_a[index_a])

    if index_b == -1:
        return False

    if index_a == index_b:
        return (ls_a == ls_b)

    # cancel out 'index_a'
    index_b = (index_b - index_a)
    if index_b < 0:
        index_b += len(ls_a)
    index_a = 0  # ignore it

    # compare rotated lists
    for i in range(len(ls_a)):
        if ls_a[i] != ls_b[(index_b + i) % len(ls_b)]:
            return False
    return True


assert(compare_circular_lists([0, 9, -1, 2, -1], [-1, 2, -1, 0, 9]) == True)
assert(compare_circular_lists([2, 9, -1, 0, -1], [-1, 2, -1, 0, 9]) == False)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["World", "Hello" "Circular"]) == True)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["Circular", "Hello" "World"]) == False)

请参阅:此片段以获取更多测试/示例。

As others have mentioned, once you find the normalized rotation of a list, you can compare them.

Heres some working code that does this, Basic method is to find a normalized rotation for each list and compare:

  • Calculate a normalized rotation index on each list.
  • Loop over both lists with their offsets, comparing each item, returning if they mis-match.

Note that this method is it doesn’t depend on numbers, you can pass in lists of strings (any values which can be compared).

Instead of doing a list-in-list search, we know we want the list to start with the minimum value – so we can loop over the minimum values, searching until we find which one has the lowest successive values, storing this for further comparisons until we have the best.

There are many opportunities to exit early when calculating the index, details on some optimizations.

  • Skip searching for the best minimum value when theres only one.
  • Skip searching minimum values when the previous is also a minimum value (it will never be a better match).
  • Skip searching when all values are the same.
  • Fail early when lists have different minimum values.
  • Use regular comparison when offsets match.
  • Adjust offsets to avoid wrapping the index values on one of the lists during comparison.

Note that in Python a list-in-list search may well be faster, however I was interested to find an efficient algorithm – which could be used in other languages too. Also, there is some advantage to avoiding to create new lists.

def normalize_rotation_index(ls, v_min_other=None):
    """ Return the index or -1 (when the minimum is above `v_min_other`) """

    if len(ls) <= 1:
        return 0

    def compare_rotations(i_a, i_b):
        """ Return True when i_a is smaller.
            Note: unless there are large duplicate sections of identical values,
            this loop will exit early on.
        """
        for offset in range(1, len(ls)):
            v_a = ls[(i_a + offset) % len(ls)]
            v_b = ls[(i_b + offset) % len(ls)]
            if v_a < v_b:
                return True
            elif v_a > v_b:
                return False
        return False

    v_min = ls[0]
    i_best_first = 0
    i_best_last = 0
    i_best_total = 1
    for i in range(1, len(ls)):
        v = ls[i]
        if v_min > v:
            v_min = v
            i_best_first = i
            i_best_last = i
            i_best_total = 1
        elif v_min == v:
            i_best_last = i
            i_best_total += 1

    # all values match
    if i_best_total == len(ls):
        return 0

    # exit early if we're not matching another lists minimum
    if v_min_other is not None:
        if v_min != v_min_other:
            return -1
    # simple case, only one minimum
    if i_best_first == i_best_last:
        return i_best_first

    # otherwise find the minimum with the lowest values compared to all others.
    # start looking after the first we've found
    i_best = i_best_first
    for i in range(i_best_first + 1, i_best_last + 1):
        if (ls[i] == v_min) and (ls[i - 1] != v_min):
            if compare_rotations(i, i_best):
                i_best = i

    return i_best


def compare_circular_lists(ls_a, ls_b):
    # sanity checks
    if len(ls_a) != len(ls_b):
        return False
    if len(ls_a) <= 1:
        return (ls_a == ls_b)

    index_a = normalize_rotation_index(ls_a)
    index_b = normalize_rotation_index(ls_b, ls_a[index_a])

    if index_b == -1:
        return False

    if index_a == index_b:
        return (ls_a == ls_b)

    # cancel out 'index_a'
    index_b = (index_b - index_a)
    if index_b < 0:
        index_b += len(ls_a)
    index_a = 0  # ignore it

    # compare rotated lists
    for i in range(len(ls_a)):
        if ls_a[i] != ls_b[(index_b + i) % len(ls_b)]:
            return False
    return True


assert(compare_circular_lists([0, 9, -1, 2, -1], [-1, 2, -1, 0, 9]) == True)
assert(compare_circular_lists([2, 9, -1, 0, -1], [-1, 2, -1, 0, 9]) == False)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["World", "Hello" "Circular"]) == True)
assert(compare_circular_lists(["Hello" "Circular", "World"], ["Circular", "Hello" "World"]) == False)

See: this snippet for some more tests/examples.


回答 16

您可以很容易地检查列表A是否等于列表B在预期O(N)时间中的循环移位。

我将使用多项式哈希函数来计算列表A的哈希以及列表B的每个循环移位。如果列表B的移位具有与列表A相同的哈希,则将比较实际元素以查看它们是否相等。

之所以如此之快,是因为使用多项式哈希函数(这是非常常见的!),您可以在恒定时间内计算上一个循环移位的哈希,因此您可以计算O()中所有循环移位的哈希N)时间。

它是这样的:

假设B具有N个元素,那么使用质数P的B的哈希为:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb = Hb*P + B[i];
}

这是一种评估P中的多项式的优化方法,等效于:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb += B[i] * P^(N-1-i);  //^ is exponentiation, not XOR
}

注意如何将每个B [i]乘以P ^(N-1-i)。如果我们将B左移1,那么每个B [i]都将乘以一个额外的P,第一个除外。由于乘法分布在加法之上,因此我们可以一次乘以整个散列就可以乘以所有分量,然后确定第一个元素的因子。

B左移的哈希只是

Hb1 = Hb*P + B[0]*(1-(P^N))

第二个左移:

Hb2 = Hb1*P + B[1]*(1-(P^N))

等等…

注意:上面的所有数学运算都是以某​​些机器字长为模,并且您只需计算一次P ^ N。

You can check to see if a list A is equal to a cyclic shift of list B in expected O(N) time pretty easily.

I would use a polynomial hash function to compute the hash of list A, and every cyclic shift of list B. Where a shift of list B has the same hash as list A, I’d compare the actual elements to see if they are equal.

The reason this is fast is that with polynomial hash functions (which are extremely common!), you can calculate the hash of each cyclic shift from the previous one in constant time, so you can calculate hashes for all of the cyclic shifts in O(N) time.

It works like this:

Let’s say B has N elements, then the the hash of B using prime P is:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb = Hb*P + B[i];
}

This is an optimized way to evaluate a polynomial in P, and is equivalent to:

Hb=0;
for (i=0; i<N ; i++)
{
    Hb += B[i] * P^(N-1-i);  //^ is exponentiation, not XOR
}

Notice how every B[i] is multiplied by P^(N-1-i). If we shift B to the left by 1, then every every B[i] will be multiplied by an extra P, except the first one. Since multiplication distributes over addition, we can multiply all the components at once just by multiplying the whole hash, and then fix up the factor for the first element.

The hash of the left shift of B is just

Hb1 = Hb*P + B[0]*(1-(P^N))

The second left shift:

Hb2 = Hb1*P + B[1]*(1-(P^N))

and so on…

NOTE: all math above is performed modulo some machine word size, and you only have to calculate P^N once.


回答 17

要粘合到最pythonic的方式,请使用set!

from sets import Set
a = Set ([1, 1, 1, 0, 0])
b = Set ([0, 1, 1, 1, 0]) 
c = Set ([1, 0, 0, 1, 1])
a==b
True
a==b==c
True

To glue to the most pythonic way to do it, use sets !

from sets import Set
a = Set ([1, 1, 1, 0, 0])
b = Set ([0, 1, 1, 1, 0]) 
c = Set ([1, 0, 0, 1, 1])
a==b
True
a==b==c
True

检查列表是否已排序的Python方法

问题:检查列表是否已排序的Python方法

有没有一种pythonic的方法来检查列表是否已经排序ASCDESC

listtimestamps = [1, 2, 3, 5, 6, 7]

诸如此类的东西isttimestamps.isSorted()会返回TrueFalse

我想输入一些消息的时间戳列表,并检查事务是否以正确的顺序出现。

Is there a pythonic way to check if a list is already sorted in ASC or DESC

listtimestamps = [1, 2, 3, 5, 6, 7]

something like isttimestamps.isSorted() that returns True or False.

I want to input a list of timestamps for some messages and check if the the transactions appeared in the correct order.


回答 0

实际上,我们没有给出anijhaw寻找的答案。这是一行代码:

all(l[i] <= l[i+1] for i in xrange(len(l)-1))

对于Python 3:

all(l[i] <= l[i+1] for i in range(len(l)-1))

Actually we are not giving the answer anijhaw is looking for. Here is the one liner:

all(l[i] <= l[i+1] for i in xrange(len(l)-1))

For Python 3:

all(l[i] <= l[i+1] for i in range(len(l)-1))

回答 1

我只会用

if sorted(lst) == lst:
    # code here

除非这是一个很大的列表,否则您可能需要创建一个自定义函数。

如果您只是要对它进行排序(如果未排序的话),那么请忘记对它进行排序。

lst.sort()

并不要考虑太多。

如果您想使用自定义功能,可以执行以下操作

def is_sorted(lst, key=lambda x: x):
    for i, el in enumerate(lst[1:]):
        if key(el) < key(lst[i]): # i is the index of the previous element
            return False
    return True

如果列表已经被排序了,那么它将是O(n)(那时O(n)处于for循环中!)因此,除非您希望大多数时间都不对列表进行排序(并且相当随机),否则,再次,只需对列表进行排序。

I would just use

if sorted(lst) == lst:
    # code here

unless it’s a very big list in which case you might want to create a custom function.

if you are just going to sort it if it’s not sorted, then forget the check and sort it.

lst.sort()

and don’t think about it too much.

if you want a custom function, you can do something like

def is_sorted(lst, key=lambda x: x):
    for i, el in enumerate(lst[1:]):
        if key(el) < key(lst[i]): # i is the index of the previous element
            return False
    return True

This will be O(n) if the list is already sorted though (and O(n) in a for loop at that!) so, unless you expect it to be not sorted (and fairly random) most of the time, I would, again, just sort the list.


回答 2

该迭代器形式比使用整数索引快10-15%:

# python2 only
if str is bytes:
    from itertools import izip as zip

def is_sorted(l):
    return all(a <= b for a, b in zip(l, l[1:]))

This iterator form is 10-15% faster than using integer indexing:

# python2 only
if str is bytes:
    from itertools import izip as zip

def is_sorted(l):
    return all(a <= b for a, b in zip(l, l[1:]))

回答 3

实现此目的的一种好方法是使用imap来自itertools以下函数:

from itertools import imap, tee
import operator

def is_sorted(iterable, compare=operator.le):
  a, b = tee(iterable)
  next(b, None)
  return all(imap(compare, a, b))

这种实现是快速的,并且适用于任何迭代。

A beautiful way to implement this is to use the imap function from itertools:

from itertools import imap, tee
import operator

def is_sorted(iterable, compare=operator.le):
  a, b = tee(iterable)
  next(b, None)
  return all(imap(compare, a, b))

This implementation is fast and works on any iterables.


回答 4

我运行了一个基准测试sorted(lst, reverse=True) == lst对于长名单来说all(l[i] >= l[i+1] for i in xrange(len(l)-1))是最快的,对于短名单来说是最快的。这些基准测试是在MacBook Pro 2010 13英寸(Core2 Duo 2.66GHz,4GB 1067MHz DDR3 RAM,Mac OS X 10.6.5)上运行的。

更新:我修改了脚本,以便您可以在自己的系统上直接运行它。先前的版本存在错误。另外,我添加了已排序和未排序的输入。

  • 最适合短列表: all(l[i] >= l[i+1] for i in xrange(len(l)-1))
  • 最适合长排序列表: sorted(l, reverse=True) == l
  • 最适合简短的未排序列表: all(l[i] >= l[i+1] for i in xrange(len(l)-1))
  • 最适合长时间未排序的列表: all(l[i] >= l[i+1] for i in xrange(len(l)-1))

因此,在大多数情况下,都有明显的赢家。

更新: aaronsterling的答案(第6和第7名)实际上在所有情况下都是最快的。#7最快,因为它没有间接层来查找密钥。

#!/usr/bin/env python

import itertools
import time

def benchmark(f, *args):
    t1 = time.time()
    for i in xrange(1000000):
        f(*args)
    t2 = time.time()
    return t2-t1

L1 = range(4, 0, -1)
L2 = range(100, 0, -1)
L3 = range(0, 4)
L4 = range(0, 100)

# 1.
def isNonIncreasing(l, key=lambda x,y: x >= y): 
    return all(key(l[i],l[i+1]) for i in xrange(len(l)-1))
print benchmark(isNonIncreasing, L1) # 2.47253704071
print benchmark(isNonIncreasing, L2) # 34.5398209095
print benchmark(isNonIncreasing, L3) # 2.1916718483
print benchmark(isNonIncreasing, L4) # 2.19576501846

# 2.
def isNonIncreasing(l):
    return all(l[i] >= l[i+1] for i in xrange(len(l)-1))
print benchmark(isNonIncreasing, L1) # 1.86919999123
print benchmark(isNonIncreasing, L2) # 21.8603689671
print benchmark(isNonIncreasing, L3) # 1.95684289932
print benchmark(isNonIncreasing, L4) # 1.95272517204

# 3.
def isNonIncreasing(l, key=lambda x,y: x >= y): 
    return all(key(a,b) for (a,b) in itertools.izip(l[:-1],l[1:]))
print benchmark(isNonIncreasing, L1) # 2.65468883514
print benchmark(isNonIncreasing, L2) # 29.7504849434
print benchmark(isNonIncreasing, L3) # 2.78062295914
print benchmark(isNonIncreasing, L4) # 3.73436689377

# 4.
def isNonIncreasing(l):
    return all(a >= b for (a,b) in itertools.izip(l[:-1],l[1:]))
print benchmark(isNonIncreasing, L1) # 2.06947803497
print benchmark(isNonIncreasing, L2) # 15.6351969242
print benchmark(isNonIncreasing, L3) # 2.45671010017
print benchmark(isNonIncreasing, L4) # 3.48461818695

# 5.
def isNonIncreasing(l):
    return sorted(l, reverse=True) == l
print benchmark(isNonIncreasing, L1) # 2.01579380035
print benchmark(isNonIncreasing, L2) # 5.44593787193
print benchmark(isNonIncreasing, L3) # 2.01813793182
print benchmark(isNonIncreasing, L4) # 4.97615599632

# 6.
def isNonIncreasing(l, key=lambda x, y: x >= y): 
    for i, el in enumerate(l[1:]):
        if key(el, l[i-1]):
            return False
    return True
print benchmark(isNonIncreasing, L1) # 1.06842684746
print benchmark(isNonIncreasing, L2) # 1.67291283607
print benchmark(isNonIncreasing, L3) # 1.39491200447
print benchmark(isNonIncreasing, L4) # 1.80557894707

# 7.
def isNonIncreasing(l):
    for i, el in enumerate(l[1:]):
        if el >= l[i-1]:
            return False
    return True
print benchmark(isNonIncreasing, L1) # 0.883186101913
print benchmark(isNonIncreasing, L2) # 1.42852401733
print benchmark(isNonIncreasing, L3) # 1.09229516983
print benchmark(isNonIncreasing, L4) # 1.59502696991

I ran a benchmark and sorted(lst, reverse=True) == lst was the fastest for long lists, and all(l[i] >= l[i+1] for i in xrange(len(l)-1)) was the fastest for short lists. These benchmarks were run on a MacBook Pro 2010 13″ (Core2 Duo 2.66GHz, 4GB 1067MHz DDR3 RAM, Mac OS X 10.6.5).

UPDATE: I revised the script so that you can run it directly on your own system. The previous version had bugs. Also, I have added both sorted and unsorted inputs.

  • Best for short sorted lists: all(l[i] >= l[i+1] for i in xrange(len(l)-1))
  • Best for long sorted lists: sorted(l, reverse=True) == l
  • Best for short unsorted lists: all(l[i] >= l[i+1] for i in xrange(len(l)-1))
  • Best for long unsorted lists: all(l[i] >= l[i+1] for i in xrange(len(l)-1))

So in most cases there is a clear winner.

UPDATE: aaronsterling’s answers (#6 and #7) are actually the fastest in all cases. #7 is the fastest because it doesn’t have a layer of indirection to lookup the key.

#!/usr/bin/env python

import itertools
import time

def benchmark(f, *args):
    t1 = time.time()
    for i in xrange(1000000):
        f(*args)
    t2 = time.time()
    return t2-t1

L1 = range(4, 0, -1)
L2 = range(100, 0, -1)
L3 = range(0, 4)
L4 = range(0, 100)

# 1.
def isNonIncreasing(l, key=lambda x,y: x >= y): 
    return all(key(l[i],l[i+1]) for i in xrange(len(l)-1))
print benchmark(isNonIncreasing, L1) # 2.47253704071
print benchmark(isNonIncreasing, L2) # 34.5398209095
print benchmark(isNonIncreasing, L3) # 2.1916718483
print benchmark(isNonIncreasing, L4) # 2.19576501846

# 2.
def isNonIncreasing(l):
    return all(l[i] >= l[i+1] for i in xrange(len(l)-1))
print benchmark(isNonIncreasing, L1) # 1.86919999123
print benchmark(isNonIncreasing, L2) # 21.8603689671
print benchmark(isNonIncreasing, L3) # 1.95684289932
print benchmark(isNonIncreasing, L4) # 1.95272517204

# 3.
def isNonIncreasing(l, key=lambda x,y: x >= y): 
    return all(key(a,b) for (a,b) in itertools.izip(l[:-1],l[1:]))
print benchmark(isNonIncreasing, L1) # 2.65468883514
print benchmark(isNonIncreasing, L2) # 29.7504849434
print benchmark(isNonIncreasing, L3) # 2.78062295914
print benchmark(isNonIncreasing, L4) # 3.73436689377

# 4.
def isNonIncreasing(l):
    return all(a >= b for (a,b) in itertools.izip(l[:-1],l[1:]))
print benchmark(isNonIncreasing, L1) # 2.06947803497
print benchmark(isNonIncreasing, L2) # 15.6351969242
print benchmark(isNonIncreasing, L3) # 2.45671010017
print benchmark(isNonIncreasing, L4) # 3.48461818695

# 5.
def isNonIncreasing(l):
    return sorted(l, reverse=True) == l
print benchmark(isNonIncreasing, L1) # 2.01579380035
print benchmark(isNonIncreasing, L2) # 5.44593787193
print benchmark(isNonIncreasing, L3) # 2.01813793182
print benchmark(isNonIncreasing, L4) # 4.97615599632

# 6.
def isNonIncreasing(l, key=lambda x, y: x >= y): 
    for i, el in enumerate(l[1:]):
        if key(el, l[i-1]):
            return False
    return True
print benchmark(isNonIncreasing, L1) # 1.06842684746
print benchmark(isNonIncreasing, L2) # 1.67291283607
print benchmark(isNonIncreasing, L3) # 1.39491200447
print benchmark(isNonIncreasing, L4) # 1.80557894707

# 7.
def isNonIncreasing(l):
    for i, el in enumerate(l[1:]):
        if el >= l[i-1]:
            return False
    return True
print benchmark(isNonIncreasing, L1) # 0.883186101913
print benchmark(isNonIncreasing, L2) # 1.42852401733
print benchmark(isNonIncreasing, L3) # 1.09229516983
print benchmark(isNonIncreasing, L4) # 1.59502696991

回答 5

我会这样做的(从这里的很多答案中窃取了[亚伦·斯特林,伟业同志,保罗·麦奎尔(Paul McGuire)等等),大部分都是阿明·罗纳彻(Armin Ronacher):

from itertools import tee, izip

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

def is_sorted(iterable, key=lambda a, b: a <= b):
    return all(key(a, b) for a, b in pairwise(iterable))

一件好事:您不必实现该系列的第二个可迭代对象(与列表切片不同)。

I’d do this (stealing from a lot of answers here [Aaron Sterling, Wai Yip Tung, sorta from Paul McGuire] and mostly Armin Ronacher):

from itertools import tee, izip

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

def is_sorted(iterable, key=lambda a, b: a <= b):
    return all(key(a, b) for a, b in pairwise(iterable))

One nice thing: you don’t have to realize the second iterable for the series (unlike with a list slice).


回答 6

我使用基于numpy.diff()的这种单线:

def issorted(x):
    """Check if x is sorted"""
    return (numpy.diff(x) >= 0).all() # is diff between all consecutive entries >= 0?

我还没有真正针对任何其他方法计时,但是我认为它比任何纯Python方法都快,尤其是对于大n而言,因为numpy.diff中的循环(可能)直接在C中运行(n-1个减法,后跟n个) -1比较)。

但是,如果x是无符号的int,则需要小心,这可能会导致numpy.diff()中的无声整数下溢,从而导致误报。这是修改后的版本:

def issorted(x):
    """Check if x is sorted"""
    try:
        if x.dtype.kind == 'u':
            # x is unsigned int array, risk of int underflow in np.diff
            x = numpy.int64(x)
    except AttributeError:
        pass # no dtype, not an array
    return (numpy.diff(x) >= 0).all()

I use this one-liner based on numpy.diff():

def issorted(x):
    """Check if x is sorted"""
    return (numpy.diff(x) >= 0).all() # is diff between all consecutive entries >= 0?

I haven’t really timed it against any other method, but I assume it’s faster than any pure Python method, especially for large n, since the loop in numpy.diff (probably) runs directly in C (n-1 subtractions followed by n-1 comparisons).

However, you need to be careful if x is an unsigned int, which might cause silent integer underflow in numpy.diff(), resulting in a false positive. Here’s a modified version:

def issorted(x):
    """Check if x is sorted"""
    try:
        if x.dtype.kind == 'u':
            # x is unsigned int array, risk of int underflow in np.diff
            x = numpy.int64(x)
    except AttributeError:
        pass # no dtype, not an array
    return (numpy.diff(x) >= 0).all()

回答 7

这类似于最佳答案,但我更喜欢它,因为它避免了显式索引。假设列表名称为lst,则可以
(item, next_item)使用zip以下命令从列表中生成元组:

all(x <= y for x,y in zip(lst, lst[1:]))

在Python 3中,zip已经返回了生成器;在Python 2中,可以使用它itertools.izip来提高内存效率。

小型演示:

>>> lst = [1, 2, 3, 4]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 4)]
>>> all(x <= y for x,y in zip(lst, lst[1:]))
True
>>> 
>>> lst = [1, 2, 3, 2]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 2)]
>>> all(x <= y for x,y in zip(lst, lst[1:]))
False

(3, 2)评估元组时,最后一个失败。

奖励:检查无法索引的有限(!)生成器:

>>> def gen1():
...     yield 1
...     yield 2
...     yield 3
...     yield 4
...     
>>> def gen2():
...     yield 1
...     yield 2
...     yield 4
...     yield 3
... 
>>> g1_1 = gen1()
>>> g1_2 = gen1()
>>> next(g1_2)
1
>>> all(x <= y for x,y in zip(g1_1, g1_2))
True
>>>
>>> g2_1 = gen2()
>>> g2_2 = gen2()
>>> next(g2_2)
1
>>> all(x <= y for x,y in zip(g2_1, g2_2))
False

itertools.izip如果您使用的是Python 2,请确保在此处使用,否则您将失去不必从生成器创建列表的目的。

This is similar to the top answer, but I like it better because it avoids explicit indexing. Assuming your list has the name lst, you can generate
(item, next_item) tuples from your list with zip:

all(x <= y for x,y in zip(lst, lst[1:]))

In Python 3, zip already returns a generator, in Python 2 you can use itertools.izip for better memory efficiency.

Small demo:

>>> lst = [1, 2, 3, 4]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 4)]
>>> all(x <= y for x,y in zip(lst, lst[1:]))
True
>>> 
>>> lst = [1, 2, 3, 2]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 2)]
>>> all(x <= y for x,y in zip(lst, lst[1:]))
False

The last one fails when the tuple (3, 2) is evaluated.

Bonus: checking finite (!) generators which cannot be indexed:

>>> def gen1():
...     yield 1
...     yield 2
...     yield 3
...     yield 4
...     
>>> def gen2():
...     yield 1
...     yield 2
...     yield 4
...     yield 3
... 
>>> g1_1 = gen1()
>>> g1_2 = gen1()
>>> next(g1_2)
1
>>> all(x <= y for x,y in zip(g1_1, g1_2))
True
>>>
>>> g2_1 = gen2()
>>> g2_2 = gen2()
>>> next(g2_2)
1
>>> all(x <= y for x,y in zip(g2_1, g2_2))
False

Make sure to use itertools.izip here if you are using Python 2, otherwise you would defeat the purpose of not having to create lists from the generators.


回答 8

SapphireSun是完全正确的。您可以使用lst.sort()。Python的排序实现(TimSort)检查列表是否已排序。如果这样,sort()将在线性时间内完成。听起来像是一种Python方式,可以确保对列表进行排序;)

SapphireSun is quite right. You can just use lst.sort(). Python’s sort implementation (TimSort) check if the list is already sorted. If so sort() will completed in linear time. Sounds like a Pythonic way to ensure a list is sorted ;)


回答 9

尽管我认为不能保证该sorted内置函数使用调用其cmp函数i+1, i,但对于CPython来说确实如此。

因此,您可以执行以下操作:

def my_cmp(x, y):
   cmpval = cmp(x, y)
   if cmpval < 0:
      raise ValueError
   return cmpval

def is_sorted(lst):
   try:
      sorted(lst, cmp=my_cmp)
      return True
   except ValueError:
      return False

print is_sorted([1,2,3,5,6,7])
print is_sorted([1,2,5,3,6,7])

还是这样(如果if语句-> EAFP出了错?;-)):

def my_cmp(x, y):
   assert(x >= y)
   return -1

def is_sorted(lst):
   try:
      sorted(lst, cmp=my_cmp)
      return True
   except AssertionError:
      return False

Although I don’t think there is a guarantee for that the sorted built-in calls its cmp function with i+1, i, it does seem to do so for CPython.

So you could do something like:

def my_cmp(x, y):
   cmpval = cmp(x, y)
   if cmpval < 0:
      raise ValueError
   return cmpval

def is_sorted(lst):
   try:
      sorted(lst, cmp=my_cmp)
      return True
   except ValueError:
      return False

print is_sorted([1,2,3,5,6,7])
print is_sorted([1,2,5,3,6,7])

Or this way (without if statements -> EAFP gone wrong? ;-) ):

def my_cmp(x, y):
   assert(x >= y)
   return -1

def is_sorted(lst):
   try:
      sorted(lst, cmp=my_cmp)
      return True
   except AssertionError:
      return False

回答 10

完全不是Pythonic,但我们至少需要一个reduce()答案,对吗?

def is_sorted(iterable):
    prev_or_inf = lambda prev, i: i if prev <= i else float('inf')
    return reduce(prev_or_inf, iterable, float('-inf')) < float('inf')

累加器变量仅存储该最后检查的值,如果任何值小于前一个值,则累加器将设置为无穷大(因此,最后的值仍然为无穷大,因为“先前的值”将始终大于当前的)。

Not very Pythonic at all, but we need at least one reduce() answer, right?

def is_sorted(iterable):
    prev_or_inf = lambda prev, i: i if prev <= i else float('inf')
    return reduce(prev_or_inf, iterable, float('-inf')) < float('inf')

The accumulator variable simply stores that last-checked value, and if any value is smaller than the previous value, the accumulator is set to infinity (and thus will still be infinity at the end, since the ‘previous value’ will always be bigger than the current one).


回答 11

正如@aaronsterling所指出的那样,以下解决方案是最短的,并且在对数组进行排序且不太小时似乎是最快的:def is_sorted(lst):return(sorted(lst)== lst)

如果大多数情况下未对数组进行排序,则希望使用不扫描整个数组并在发现未排序前缀后立即返回False的解决方案。以下是我能找到的最快的解决方案,它并不是特别优雅:

def is_sorted(lst):
    it = iter(lst)
    try:
        prev = it.next()
    except StopIteration:
        return True
    for x in it:
        if prev > x:
            return False
        prev = x
    return True

使用Nathan Farrington的基准测试,在所有情况下都比使用sorted(lst)更好的运行时间,但在大型排序列表上运行时除外。

这是我计算机上的基准测试结果。

sorted(lst)== lst解决方案

  • L1:1.23838591576
  • L2:4.19063091278
  • L3:1.17992287346
  • L4:4.668399500847

第二种解决方案:

  • L1:0.81095790863
  • L2:0.802397012711
  • L3:1.06135106087
  • L4:8.82761001587

As noted by @aaronsterling the following solution is the shortest and seems fastest when the array is sorted and not too small: def is_sorted(lst): return (sorted(lst) == lst)

If most of the time the array is not sorted, it would be desirable to use a solution that does not scan the entire array and returns False as soon as an unsorted prefix is discovered. Following is the fastest solution I could find, it is not particularly elegant:

def is_sorted(lst):
    it = iter(lst)
    try:
        prev = it.next()
    except StopIteration:
        return True
    for x in it:
        if prev > x:
            return False
        prev = x
    return True

Using Nathan Farrington’s benchmark, this achieves better runtime than using sorted(lst) in all cases except when running on the large sorted list.

Here are the benchmark results on my computer.

sorted(lst)==lst solution

  • L1: 1.23838591576
  • L2: 4.19063091278
  • L3: 1.17996287346
  • L4: 4.68399500847

Second solution:

  • L1: 0.81095790863
  • L2: 0.802397012711
  • L3: 1.06135106087
  • L4: 8.82761001587

回答 12

如果您想要最快的方法用于numpy数组,请使用numba,如果使用conda,则应已安装

该代码将很快,因为它将由numba编译

import numba
@numba.jit
def issorted(vec, ascending=True):
    if len(vec) < 2:
        return True
    if ascending:
        for i in range(1, len(vec)):
            if vec[i-1] > vec[i]:
                return False
        return True
    else:
        for i in range(1, len(vec)):
            if vec[i-1] < vec[i]:
                return False
        return True

然后:

>>> issorted(array([4,9,100]))
>>> True

If you want the fastest way for numpy arrays, use numba, which if you use conda should be already installed

The code will be fast because it will be compiled by numba

import numba
@numba.jit
def issorted(vec, ascending=True):
    if len(vec) < 2:
        return True
    if ascending:
        for i in range(1, len(vec)):
            if vec[i-1] > vec[i]:
                return False
        return True
    else:
        for i in range(1, len(vec)):
            if vec[i-1] < vec[i]:
                return False
        return True

and then:

>>> issorted(array([4,9,100]))
>>> True

回答 13

只是添加另一种方式(即使它需要一个附加模块)iteration_utilities.all_monotone

>>> from iteration_utilities import all_monotone
>>> listtimestamps = [1, 2, 3, 5, 6, 7]
>>> all_monotone(listtimestamps)
True

>>> all_monotone([1,2,1])
False

要检查DESC订单:

>>> all_monotone(listtimestamps, decreasing=True)
False

>>> all_monotone([3,2,1], decreasing=True)
True

strict如果需要严格检查(如果连续元素不应相等)单调序列,则还有一个参数。

在您的情况下这不是问题,但是如果您的序列包含nan值,那么某些方法将失败,例如sorted:

def is_sorted_using_sorted(iterable):
    return sorted(iterable) == iterable

>>> is_sorted_using_sorted([3, float('nan'), 1])  # definetly False, right?
True

>>> all_monotone([3, float('nan'), 1])
False

请注意,iteration_utilities.all_monotone与此处提到的其他解决方案相比,该方法的执行速度更快,尤其是对于未排序的输入(请参阅基准)。

Just to add another way (even if it requires an additional module): iteration_utilities.all_monotone:

>>> from iteration_utilities import all_monotone
>>> listtimestamps = [1, 2, 3, 5, 6, 7]
>>> all_monotone(listtimestamps)
True

>>> all_monotone([1,2,1])
False

To check for DESC order:

>>> all_monotone(listtimestamps, decreasing=True)
False

>>> all_monotone([3,2,1], decreasing=True)
True

There is also a strict parameter if you need to check for strictly (if successive elements should not be equal) monotonic sequences.

It’s not a problem in your case but if your sequences contains nan values then some methods will fail, for example with sorted:

def is_sorted_using_sorted(iterable):
    return sorted(iterable) == iterable

>>> is_sorted_using_sorted([3, float('nan'), 1])  # definetly False, right?
True

>>> all_monotone([3, float('nan'), 1])
False

Note that iteration_utilities.all_monotone performs faster compared to the other solutions mentioned here especially for unsorted inputs (see benchmark).


回答 14

from itertools import tee

def is_sorted(l):
    l1, l2 = tee(l)
    next(l2, None)
    return all(a <= b for a, b in zip(l1, l2))

Lazy

from itertools import tee

def is_sorted(l):
    l1, l2 = tee(l)
    next(l2, None)
    return all(a <= b for a, b in zip(l1, l2))

回答 15

的Python 3.6.8

from more_itertools import pairwise

class AssertionHelper:
    @classmethod
    def is_ascending(cls, data: iter) -> bool:
        for a, b in pairwise(data):
            if a > b:
                return False
        return True

    @classmethod
    def is_descending(cls, data: iter) -> bool:
        for a, b in pairwise(data):
            if a < b:
                return False
        return True

    @classmethod
    def is_sorted(cls, data: iter) -> bool:
        return cls.is_ascending(data) or cls.is_descending(data)
>>> AssertionHelper.is_descending((1, 2, 3, 4))
False
>>> AssertionHelper.is_ascending((1, 2, 3, 4))
True
>>> AssertionHelper.is_sorted((1, 2, 3, 4))
True

Python 3.6.8

from more_itertools import pairwise

class AssertionHelper:
    @classmethod
    def is_ascending(cls, data: iter) -> bool:
        for a, b in pairwise(data):
            if a > b:
                return False
        return True

    @classmethod
    def is_descending(cls, data: iter) -> bool:
        for a, b in pairwise(data):
            if a < b:
                return False
        return True

    @classmethod
    def is_sorted(cls, data: iter) -> bool:
        return cls.is_ascending(data) or cls.is_descending(data)
>>> AssertionHelper.is_descending((1, 2, 3, 4))
False
>>> AssertionHelper.is_ascending((1, 2, 3, 4))
True
>>> AssertionHelper.is_sorted((1, 2, 3, 4))
True


回答 16

最简单的方法:

def isSorted(arr):
  i = 1
  while i < len(arr):
    if(result[i] < result[i - 1]):
      return False
    i += 1
  return True

Simplest way:

def isSorted(arr):
  i = 1
  while i < len(arr):
    if(result[i] < result[i - 1]):
      return False
    i += 1
  return True

回答 17

from functools import reduce

# myiterable can be of any iterable type (including list)
isSorted = reduce(lambda r, e: (r[0] and (r[1] or r[2] <= e), False, e), myiterable, (True, True, None))[0]

派生的减少值是(sortedSoFarFlagfirstTimeFlaglastElementValue)的三部分元组。它最初开始与(TrueTrueNone),其也被用作结果为空列表(被视为排序,因为有外的顺序没有元素)。在处理每个元素时,它会计算元组的新值(使用先前的元组值和下一个elementValue):

[0] (sortedSoFarFlag) evaluates true if: prev_0 is true and (prev_1 is true or prev_2 <= elementValue)
[1] (firstTimeFlag): False
[2] (lastElementValue): elementValue

减少的最终结果是一个元组:

[0]: True/False depending on whether the entire list was in sorted order
[1]: True/False depending on whether the list was empty
[2]: the last element value

第一个值是我们感兴趣的值,因此我们通常[0]从reduce结果中获取该值。

from functools import reduce

# myiterable can be of any iterable type (including list)
isSorted = reduce(lambda r, e: (r[0] and (r[1] or r[2] <= e), False, e), myiterable, (True, True, None))[0]

The derived reduction value is a 3-part tuple of (sortedSoFarFlag, firstTimeFlag, lastElementValue). It initially starts with (True, True, None), which is also used as the result for an empty list (regarded as sorted because there are no out-of-order elements). As it processes each element it calculates new values for the tuple (using previous tuple values with the next elementValue):

[0] (sortedSoFarFlag) evaluates true if: prev_0 is true and (prev_1 is true or prev_2 <= elementValue)
[1] (firstTimeFlag): False
[2] (lastElementValue): elementValue

The final result of the reduction is a tuple of:

[0]: True/False depending on whether the entire list was in sorted order
[1]: True/False depending on whether the list was empty
[2]: the last element value

The first value is the one we’re interested in, so we use [0] to grab that from the reduce result.


回答 18

由于我没有在上方看到此选项,因此将其添加到所有答案中。用表示列表l,然后:

import numpy as np

# Trasform the list to a numpy array
x = np.array(l)

# check if ascendent sorted:
all(x[:-1] <= x[1:])

# check if descendent sorted:
all(x[:-1] >= x[1:])

As I don’t see this option above I will add it to all the answers. Let denote the list by l, then:

import numpy as np

# Trasform the list to a numpy array
x = np.array(l)

# check if ascendent sorted:
all(x[:-1] <= x[1:])

# check if descendent sorted:
all(x[:-1] >= x[1:])

回答 19

使用赋值表达式的解决方案(在Python 3.8中添加):

def is_sorted(seq):
    seq_iter = iter(seq)
    cur = next(seq_iter, None)
    return all((prev := cur) <= (cur := nxt) for nxt in seq_iter)

z = list(range(10))
print(z)
print(is_sorted(z))

import random
random.shuffle(z)
print(z)
print(is_sorted(z))

z = []
print(z)
print(is_sorted(z))

给出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
True
[1, 7, 5, 9, 4, 0, 8, 3, 2, 6]
False
[]
True

A solution using assignment expressions (added in Python 3.8):

def is_sorted(seq):
    seq_iter = iter(seq)
    cur = next(seq_iter, None)
    return all((prev := cur) <= (cur := nxt) for nxt in seq_iter)

z = list(range(10))
print(z)
print(is_sorted(z))

import random
random.shuffle(z)
print(z)
print(is_sorted(z))

z = []
print(z)
print(is_sorted(z))

Gives:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
True
[1, 7, 5, 9, 4, 0, 8, 3, 2, 6]
False
[]
True

回答 20

实际上,这是使用递归实现的最短方法:

如果已排序将打印True,否则将打印False

 def is_Sorted(lst):
    if len(lst) == 1:
       return True
    return lst[0] <= lst[1] and is_Sorted(lst[1:])

 any_list = [1,2,3,4]
 print is_Sorted(any_list)

This is in fact the shortest way to do it using recursion:

if it’s Sorted will print True else will print out False

 def is_Sorted(lst):
    if len(lst) == 1:
       return True
    return lst[0] <= lst[1] and is_Sorted(lst[1:])

 any_list = [1,2,3,4]
 print is_Sorted(any_list)

回答 21

这个怎么样 ?简单明了。

def is_list_sorted(al):

    llength =len(al)


    for i in range (llength):
        if (al[i-1] > al[i]):
            print(al[i])
            print(al[i+1])
            print('Not sorted')
            return -1

    else :
        print('sorted')
        return  true

How about this one ? Simple and straightforward.

def is_list_sorted(al):

    llength =len(al)


    for i in range (llength):
        if (al[i-1] > al[i]):
            print(al[i])
            print(al[i+1])
            print('Not sorted')
            return -1

    else :
        print('sorted')
        return  true

回答 22

绝对在Python 3及更高版本中适用于整数或字符串:

def tail(t):
    return t[:]

letters = ['a', 'b', 'c', 'd', 'e']
rest = tail(letters)
rest.sort()
if letters == rest:
    print ('Given list is SORTED.')
else:
    print ('List NOT Sorted.')

================================================== ===================

查找给定列表是否已排序的另一种方法

trees1 = list ([1, 4, 5, 3, 2])
trees2 = list (trees1)
trees2.sort()
if trees1 == trees2:
    print ('trees1 is SORTED')
else:
    print ('Not sorted')

Definitely works in Python 3 and above for integers or strings:

def tail(t):
    return t[:]

letters = ['a', 'b', 'c', 'd', 'e']
rest = tail(letters)
rest.sort()
if letters == rest:
    print ('Given list is SORTED.')
else:
    print ('List NOT Sorted.')

=====================================================================

Another way of finding if the given list is sorted or not

trees1 = list ([1, 4, 5, 3, 2])
trees2 = list (trees1)
trees2.sort()
if trees1 == trees2:
    print ('trees1 is SORTED')
else:
    print ('Not sorted')

在Python中查找数字的所有因子的最有效方法是什么?

问题:在Python中查找数字的所有因子的最有效方法是什么?

有人可以向我解释一种在Python(2.7)中找到一个数字的所有因子的有效方法吗?

我可以创建一个算法来执行此操作,但是我认为它的编码很差,并且花费大量时间才能生成大量结果。

Can someone explain to me an efficient way of finding all the factors of a number in Python (2.7)?

I can create an algorithm to do this, but I think it is poorly coded and takes too long to produce a result for a large number.


回答 0

from functools import reduce

def factors(n):    
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

这将很快返回所有因素n

为什么以平方根为上限?

sqrt(x) * sqrt(x) = x。因此,如果两个因素相同,则它们都是平方根。如果使一个因子变大,则必须使另一个因子变小。这意味着这两个之一将始终小于或等于sqrt(x),因此您只需要搜索到该点即可找到两个匹配因子之一。然后,您可以使用x / fac1获取fac2

reduce(list.__add__, ...)走的小名单[fac1, fac2],并在一个长长的清单一起加入他们。

[i, n/i] for i in range(1, int(sqrt(n)) + 1) if n % i == 0返回两个因素,如果当你除以其余n由较小的一个是零(它并不需要检查较大的一个过;它只是获取除以n由较小的一个。)

set(...)在外面摆脱重复,这仅发生于完美的正方形。对于n = 4,它将返回2两次,因此set摆脱其中之一。

from functools import reduce

def factors(n):    
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

This will return all of the factors, very quickly, of a number n.

Why square root as the upper limit?

sqrt(x) * sqrt(x) = x. So if the two factors are the same, they’re both the square root. If you make one factor bigger, you have to make the other factor smaller. This means that one of the two will always be less than or equal to sqrt(x), so you only have to search up to that point to find one of the two matching factors. You can then use x / fac1 to get fac2.

The reduce(list.__add__, ...) is taking the little lists of [fac1, fac2] and joining them together in one long list.

The [i, n/i] for i in range(1, int(sqrt(n)) + 1) if n % i == 0 returns a pair of factors if the remainder when you divide n by the smaller one is zero (it doesn’t need to check the larger one too; it just gets that by dividing n by the smaller one.)

The set(...) on the outside is getting rid of duplicates, which only happens for perfect squares. For n = 4, this will return 2 twice, so set gets rid of one of them.


回答 1

@agf提出的解决方案很棒,但是通过检查奇偶校验,可以将任意奇数的运行时间缩短约50%。由于奇数的因子本身始终都是奇数,因此在处理奇数时不必检查它们。

我刚刚开始解决欧拉计划困惑了自己。在某些问题中,在两个嵌套for循环内调用除数检查,因此该功能的性能至关重要。

将这一事实与agf的出色解决方案相结合,我最终获得了以下功能:

from math import sqrt
def factors(n):
        step = 2 if n%2 else 1
        return set(reduce(list.__add__,
                    ([i, n//i] for i in range(1, int(sqrt(n))+1, step) if n % i == 0)))

但是,对于较小的数字(〜<100),此更改带来的额外开销可能导致该功能花费更长的时间。

我进行了一些测试以检查速度。下面是使用的代码。为了产生不同的情节,我相应地进行了更改X = range(1,100,1)

import timeit
from math import sqrt
from matplotlib.pyplot import plot, legend, show

def factors_1(n):
    step = 2 if n%2 else 1
    return set(reduce(list.__add__,
                ([i, n//i] for i in range(1, int(sqrt(n))+1, step) if n % i == 0)))

def factors_2(n):
    return set(reduce(list.__add__,
                ([i, n//i] for i in range(1, int(sqrt(n)) + 1) if n % i == 0)))

X = range(1,100000,1000)
Y = []
for i in X:
    f_1 = timeit.timeit('factors_1({})'.format(i), setup='from __main__ import factors_1', number=10000)
    f_2 = timeit.timeit('factors_2({})'.format(i), setup='from __main__ import factors_2', number=10000)
    Y.append(f_1/f_2)
plot(X,Y, label='Running time with/without parity check')
legend()
show()

X =范围(1,100,1) X =范围(1,100,1)

这里没有显着差异,但是数量更大时,优势显而易见:

X =范围(1,100000,1000)(仅奇数) X =范围(1,100000,1000)(仅奇数)

X = range(2,100000,100)(仅偶数) X = range(2,100000,100)(仅偶数)

X = range(1,100000,1001)(交替奇偶校验) X = range(1,100000,1001)(交替奇偶校验)

The solution presented by @agf is great, but one can achieve ~50% faster run time for an arbitrary odd number by checking for parity. As the factors of an odd number always are odd themselves, it is not necessary to check these when dealing with odd numbers.

I’ve just started solving Project Euler puzzles myself. In some problems, a divisor check is called inside two nested for loops, and the performance of this function is thus essential.

Combining this fact with agf’s excellent solution, I’ve ended up with this function:

from math import sqrt
def factors(n):
        step = 2 if n%2 else 1
        return set(reduce(list.__add__,
                    ([i, n//i] for i in range(1, int(sqrt(n))+1, step) if n % i == 0)))

However, on small numbers (~ < 100), the extra overhead from this alteration may cause the function to take longer.

I ran some tests in order to check the speed. Below is the code used. To produce the different plots, I altered the X = range(1,100,1) accordingly.

import timeit
from math import sqrt
from matplotlib.pyplot import plot, legend, show

def factors_1(n):
    step = 2 if n%2 else 1
    return set(reduce(list.__add__,
                ([i, n//i] for i in range(1, int(sqrt(n))+1, step) if n % i == 0)))

def factors_2(n):
    return set(reduce(list.__add__,
                ([i, n//i] for i in range(1, int(sqrt(n)) + 1) if n % i == 0)))

X = range(1,100000,1000)
Y = []
for i in X:
    f_1 = timeit.timeit('factors_1({})'.format(i), setup='from __main__ import factors_1', number=10000)
    f_2 = timeit.timeit('factors_2({})'.format(i), setup='from __main__ import factors_2', number=10000)
    Y.append(f_1/f_2)
plot(X,Y, label='Running time with/without parity check')
legend()
show()

X = range(1,100,1) X = range(1,100,1)

No significant difference here, but with bigger numbers, the advantage is obvious:

X = range(1,100000,1000) (only odd numbers) X = range(1,100000,1000) (only odd numbers)

X = range(2,100000,100) (only even numbers) X = range(2,100000,100) (only even numbers)

X = range(1,100000,1001) (alternating parity) X = range(1,100000,1001) (alternating parity)


回答 2

AGF的答案确实很酷。我想看看是否可以重写它以避免使用reduce()。这是我想出的:

import itertools
flatten_iter = itertools.chain.from_iterable
def factors(n):
    return set(flatten_iter((i, n//i) 
                for i in range(1, int(n**0.5)+1) if n % i == 0))

我还尝试了使用棘手的生成器功能的版本:

def factors(n):
    return set(x for tup in ([i, n//i] 
                for i in range(1, int(n**0.5)+1) if n % i == 0) for x in tup)

我通过计算来计时:

start = 10000000
end = start + 40000
for n in range(start, end):
    factors(n)

我运行了一次以让Python对其进行编译,然后在time(1)命令下运行了三次,并保持了最佳时间。

  • 减少版本:11.58秒
  • itertools版本:11.49秒
  • 棘手的版本:​​11.12秒

注意itertools版本正在构建一个元组,并将其传递给flatten_iter()。如果我更改代码以构建列表,则它会稍微降低速度:

  • iterools(列表)版本:11.62秒

我相信棘手的生成器函数版本是Python中最快的。但这并没有比简化版本快很多,根据我的测量,速度大约快4%。

agf’s answer is really quite cool. I wanted to see if I could rewrite it to avoid using reduce(). This is what I came up with:

import itertools
flatten_iter = itertools.chain.from_iterable
def factors(n):
    return set(flatten_iter((i, n//i) 
                for i in range(1, int(n**0.5)+1) if n % i == 0))

I also tried a version that uses tricky generator functions:

def factors(n):
    return set(x for tup in ([i, n//i] 
                for i in range(1, int(n**0.5)+1) if n % i == 0) for x in tup)

I timed it by computing:

start = 10000000
end = start + 40000
for n in range(start, end):
    factors(n)

I ran it once to let Python compile it, then ran it under the time(1) command three times and kept the best time.

  • reduce version: 11.58 seconds
  • itertools version: 11.49 seconds
  • tricky version: 11.12 seconds

Note that the itertools version is building a tuple and passing it to flatten_iter(). If I change the code to build a list instead, it slows down slightly:

  • iterools (list) version: 11.62 seconds

I believe that the tricky generator functions version is the fastest possible in Python. But it’s not really much faster than the reduce version, roughly 4% faster based on my measurements.


回答 3

AGF回答的另一种方法:

def factors(n):    
    result = set()
    for i in range(1, int(n ** 0.5) + 1):
        div, mod = divmod(n, i)
        if mod == 0:
            result |= {i, div}
    return result

An alternative approach to agf’s answer:

def factors(n):    
    result = set()
    for i in range(1, int(n ** 0.5) + 1):
        div, mod = divmod(n, i)
        if mod == 0:
            result |= {i, div}
    return result

回答 4

这是@agf解决方案的替代方法,该解决方案以更Python的样式实现相同的算法:

def factors(n):
    return set(
        factor for i in range(1, int(n**0.5) + 1) if n % i == 0
        for factor in (i, n//i)
    )

此解决方案在没有导入的Python 2和Python 3中均可使用,并且可读性更高。我还没有测试这种方法的性能,但是渐近地它应该是相同的,并且如果性能是一个严重的问题,那么这两种解决方案都不是最优的。

Here’s an alternative to @agf’s solution which implements the same algorithm in a more pythonic style:

def factors(n):
    return set(
        factor for i in range(1, int(n**0.5) + 1) if n % i == 0
        for factor in (i, n//i)
    )

This solution works in both Python 2 and Python 3 with no imports and is much more readable. I haven’t tested the performance of this approach, but asymptotically it should be the same, and if performance is a serious concern, neither solution is optimal.


回答 5

SymPy中有一种称为强度因子的行业优势算法:

>>> from sympy import factorint
>>> factorint(2**70 + 3**80) 
{5: 2,
 41: 1,
 101: 1,
 181: 1,
 821: 1,
 1597: 1,
 5393: 1,
 27188665321L: 1,
 41030818561L: 1}

这花了不到一分钟的时间。它在多种方法之间切换。请参阅上面链接的文档。

考虑到所有主要因素,可以轻松构建所有其他因素。


请注意,即使允许接受的答案运行足够长的时间(即一个永恒的时间)来分解上述数字,但对于某些较大的数字,它将失败,例如以下示例。这是由于马虎int(n**0.5)。例如,当n = 10000000000000079**2我们有

>>> int(n**0.5)
10000000000000078L

由于10000000000000079是质数,因此可接受的答案的算法将永远找不到此因子。请注意,这不仅是一对一的。对于更大的数字,它将更多。因此,最好避免在此类算法中使用浮点数。

There is an industry-strength algorithm in SymPy called factorint:

>>> from sympy import factorint
>>> factorint(2**70 + 3**80) 
{5: 2,
 41: 1,
 101: 1,
 181: 1,
 821: 1,
 1597: 1,
 5393: 1,
 27188665321L: 1,
 41030818561L: 1}

This took under a minute. It switches among a cocktail of methods. See the documentation linked above.

Given all the prime factors, all other factors can be built easily.


Note that even if the accepted answer was allowed to run for long enough (i.e. an eternity) to factor the above number, for some large numbers it will fail, such the following example. This is due to the sloppy int(n**0.5). For example, when n = 10000000000000079**2, we have

>>> int(n**0.5)
10000000000000078L

Since 10000000000000079 is a prime, the accepted answer’s algorithm will never find this factor. Note that it’s not just an off-by-one; for larger numbers it will be off by more. For this reason it’s better to avoid floating-point numbers in algorithms of this sort.


回答 6

对于n高达10 ** 16(甚至更多)的情况,这是一个快速的纯Python 3.6解决方案,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

For n up to 10**16 (maybe even a bit more), here is a fast pure Python 3.6 solution,

from itertools import compress

def primes(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

def factorization(n):
    """ Returns a list of the prime factorization of n """
    pf = []
    for p in primeslist:
      if p*p > n : break
      count = 0
      while not n % p:
        n //= p
        count += 1
      if count > 0: pf.append((p, count))
    if n > 1: pf.append((n, 1))
    return pf

def divisors(n):
    """ Returns an unsorted list of the divisors of n """
    divs = [1]
    for p, e in factorization(n):
        divs += [x*p**k for k in range(1,e+1) for x in divs]
    return divs

n = 600851475143
primeslist = primes(int(n**0.5)+1) 
print(divisors(n))

回答 7

对afg&eryksun解决方案的进一步改进。下面的代码返回所有因素的排序列表,而不改变运行时的渐进复杂性:

    def factors(n):    
        l1, l2 = [], []
        for i in range(1, int(n ** 0.5) + 1):
            q,r = n//i, n%i     # Alter: divmod() fn can be used.
            if r == 0:
                l1.append(i) 
                l2.append(q)    # q's obtained are decreasing.
        if l1[-1] == l2[-1]:    # To avoid duplication of the possible factor sqrt(n)
            l1.pop()
        l2.reverse()
        return l1 + l2

想法:不要使用list.sort()函数来获取有序列表,从而使nlog(n)变得复杂。在l2上使用list.reverse()更快,这会增加O(n)的复杂度。(这就是python的制作方法。)在l2.reverse()之后,可以将l2附加到l1以获得因子的排序列表。

注意,l1包含不断增加的i -s。l2包含q -s递减。这就是使用上述想法的原因。

Further improvement to afg & eryksun’s solution. The following piece of code returns a sorted list of all the factors without changing run time asymptotic complexity:

    def factors(n):    
        l1, l2 = [], []
        for i in range(1, int(n ** 0.5) + 1):
            q,r = n//i, n%i     # Alter: divmod() fn can be used.
            if r == 0:
                l1.append(i) 
                l2.append(q)    # q's obtained are decreasing.
        if l1[-1] == l2[-1]:    # To avoid duplication of the possible factor sqrt(n)
            l1.pop()
        l2.reverse()
        return l1 + l2

Idea: Instead of using the list.sort() function to get a sorted list which gives nlog(n) complexity; It is much faster to use list.reverse() on l2 which takes O(n) complexity. (That’s how python is made.) After l2.reverse(), l2 may be appended to l1 to get the sorted list of factors.

Notice, l1 contains i-s which are increasing. l2 contains q-s which are decreasing. Thats the reason behind using the above idea.


回答 8

我用timeit尝试了这些绝妙的答案中的大多数,以比较它们的效率与我的简单功能,但我不断看到我的表现优于此处列出的那些。我想分享一下,看看大家都怎么想。

def factors(n):
    results = set()
    for i in xrange(1, int(math.sqrt(n)) + 1):
        if n % i == 0:
            results.add(i)
            results.add(int(n/i))
    return results

在编写本文时,您必须导入数学以进行测试,但是用n **。5替换math.sqrt(n)也应同样有效。我不会浪费时间检查重复项,因为无论如何重复项都不能存在于集合中。

I’ve tried most of these wonderful answers with timeit to compare their efficiency versus my simple function and yet I constantly see mine outperform those listed here. I figured I’d share it and see what you all think.

def factors(n):
    results = set()
    for i in xrange(1, int(math.sqrt(n)) + 1):
        if n % i == 0:
            results.add(i)
            results.add(int(n/i))
    return results

As it’s written you’ll have to import math to test, but replacing math.sqrt(n) with n**.5 should work just as well. I don’t bother wasting time checking for duplicates because duplicates can’t exist in a set regardless.


回答 9

这是另一个没有reduce的替代方法,可以很好地处理大量数据。它用于sum拉平列表。

def factors(n):
    return set(sum([[i, n//i] for i in xrange(1, int(n**0.5)+1) if not n%i], []))

Here is another alternate without reduce that performs well with large numbers. It uses sum to flatten the list.

def factors(n):
    return set(sum([[i, n//i] for i in xrange(1, int(n**0.5)+1) if not n%i], []))

回答 10

确保抓取的数字大于sqrt(number_to_factor)不寻常数字(例如99,其具有3 * 3 * 11和)floor sqrt(99)+1 == 10

import math

def factor(x):
  if x == 0 or x == 1:
    return None
  res = []
  for i in range(2,int(math.floor(math.sqrt(x)+1))):
    while x % i == 0:
      x /= i
      res.append(i)
  if x != 1: # Unusual numbers
    res.append(x)
  return res

Be sure to grab the number larger than sqrt(number_to_factor) for unusual numbers like 99 which has 3*3*11 and floor sqrt(99)+1 == 10.

import math

def factor(x):
  if x == 0 or x == 1:
    return None
  res = []
  for i in range(2,int(math.floor(math.sqrt(x)+1))):
    while x % i == 0:
      x /= i
      res.append(i)
  if x != 1: # Unusual numbers
    res.append(x)
  return res

回答 11

查找数量因子的最简单方法:

def factors(x):
    return [i for i in range(1,x+1) if x%i==0]

The simplest way of finding factors of a number:

def factors(x):
    return [i for i in range(1,x+1) if x%i==0]

回答 12

这是一个示例,如果您想使用质数更快。这些列表很容易在Internet上找到。我在代码中添加了注释。

# http://primes.utm.edu/lists/small/10000.txt
# First 10000 primes

_PRIMES = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 
        31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
        73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 
        127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 
        179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 
        233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 
        283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 
        353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 
        419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 
        467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 
        547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 
        607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 
        661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 
        739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 
        811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 
        877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 
        947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 
# Mising a lot of primes for the purpose of the example
)


from bisect import bisect_left as _bisect_left
from math import sqrt as _sqrt


def get_factors(n):
    assert isinstance(n, int), "n must be an integer."
    assert n > 0, "n must be greather than zero."
    limit = pow(_PRIMES[-1], 2)
    assert n <= limit, "n is greather then the limit of {0}".format(limit)
    result = set((1, n))
    root = int(_sqrt(n))
    primes = [t for t in get_primes_smaller_than(root + 1) if not n % t]
    result.update(primes)  # Add all the primes factors less or equal to root square
    for t in primes:
        result.update(get_factors(n/t))  # Add all the factors associted for the primes by using the same process
    return sorted(result)


def get_primes_smaller_than(n):
    return _PRIMES[:_bisect_left(_PRIMES, n)]

Here is an example if you want to use the primes number to go a lot faster. These lists are easy to find on the internet. I added comments in the code.

# http://primes.utm.edu/lists/small/10000.txt
# First 10000 primes

_PRIMES = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 
        31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
        73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 
        127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 
        179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 
        233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 
        283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 
        353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 
        419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 
        467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 
        547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 
        607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 
        661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 
        739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 
        811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 
        877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 
        947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 
# Mising a lot of primes for the purpose of the example
)


from bisect import bisect_left as _bisect_left
from math import sqrt as _sqrt


def get_factors(n):
    assert isinstance(n, int), "n must be an integer."
    assert n > 0, "n must be greather than zero."
    limit = pow(_PRIMES[-1], 2)
    assert n <= limit, "n is greather then the limit of {0}".format(limit)
    result = set((1, n))
    root = int(_sqrt(n))
    primes = [t for t in get_primes_smaller_than(root + 1) if not n % t]
    result.update(primes)  # Add all the primes factors less or equal to root square
    for t in primes:
        result.update(get_factors(n/t))  # Add all the factors associted for the primes by using the same process
    return sorted(result)


def get_primes_smaller_than(n):
    return _PRIMES[:_bisect_left(_PRIMES, n)]

回答 13

一种可能比此处介绍的算法更有效的算法(尤其是如果其中的主要因素较少时n)。诀窍是调整每次发现主要因素时需要进行试验划分的限制

def factors(n):
    '''
    return prime factors and multiplicity of n
    n = p0^e0 * p1^e1 * ... * pk^ek encoded as
    res = [(p0, e0), (p1, e1), ..., (pk, ek)]
    '''

    res = []

    # get rid of all the factors of 2 using bit shifts
    mult = 0
    while not n & 1:
        mult += 1
        n >>= 1
    if mult != 0:
        res.append((2, mult))

    limit = round(sqrt(n))
    test_prime = 3
    while test_prime <= limit:
        mult = 0
        while n % test_prime == 0:
            mult += 1
            n //= test_prime
        if mult != 0:
            res.append((test_prime, mult))
            if n == 1:              # only useful if ek >= 3 (ek: multiplicity
                break               # of the last prime) 
            limit = round(sqrt(n))  # adjust the limit
        test_prime += 2             # will often not be prime...
    if n != 1:
        res.append((n, 1))
    return res

当然,这仍然是审判部门,仅此而已。因此,其效率仍然非常有限(尤其是对于没有小除数的大量用户)。

这是python3; 划分//应该是您唯一需要适应python 2(add from __future__ import division)的东西。

a potentially more efficient algorithm than the ones presented here already (especially if there are small prime factons in n). the trick here is to adjust the limit up to which trial division is needed every time prime factors are found:

def factors(n):
    '''
    return prime factors and multiplicity of n
    n = p0^e0 * p1^e1 * ... * pk^ek encoded as
    res = [(p0, e0), (p1, e1), ..., (pk, ek)]
    '''

    res = []

    # get rid of all the factors of 2 using bit shifts
    mult = 0
    while not n & 1:
        mult += 1
        n >>= 1
    if mult != 0:
        res.append((2, mult))

    limit = round(sqrt(n))
    test_prime = 3
    while test_prime <= limit:
        mult = 0
        while n % test_prime == 0:
            mult += 1
            n //= test_prime
        if mult != 0:
            res.append((test_prime, mult))
            if n == 1:              # only useful if ek >= 3 (ek: multiplicity
                break               # of the last prime) 
            limit = round(sqrt(n))  # adjust the limit
        test_prime += 2             # will often not be prime...
    if n != 1:
        res.append((n, 1))
    return res

this is of course still trial division and nothing more fancy. and therefore still very limited in its efficiency (especially for big numbers without small divisors).

this is python3; the division // should be the only thing you need to adapt for python 2 (add from __future__ import division).


回答 14

使用set(...)会使代码稍微慢一些,并且仅在检查平方根时才真正需要。这是我的版本:

def factors(num):
    if (num == 1 or num == 0):
        return []
    f = [1]
    sq = int(math.sqrt(num))
    for i in range(2, sq):
        if num % i == 0:
            f.append(i)
            f.append(num/i)
    if sq > 1 and num % sq == 0:
        f.append(sq)
        if sq*sq != num:
            f.append(num/sq)
    return f

if sq*sq != num:对于12之类的数字,条件是必需的,其中平方根不是整数,但是平方根的底数是一个因子。

请注意,此版本本身不返回数字,但是如果需要,可以轻松解决。输出也未排序。

我将其定为在所有数字1-200上运行10000次,并在所有数字1-5000上运行100次。它的性能优于我测试过的所有其他版本,包括dansalmo,Jason Schorn,oxrock,agf,steveha和eryksun的解决方案,尽管oxrock最接近。

Using set(...) makes the code slightly slower, and is only really necessary for when you check the square root. Here’s my version:

def factors(num):
    if (num == 1 or num == 0):
        return []
    f = [1]
    sq = int(math.sqrt(num))
    for i in range(2, sq):
        if num % i == 0:
            f.append(i)
            f.append(num/i)
    if sq > 1 and num % sq == 0:
        f.append(sq)
        if sq*sq != num:
            f.append(num/sq)
    return f

The if sq*sq != num: condition is necessary for numbers like 12, where the square root is not an integer, but the floor of the square root is a factor.

Note that this version doesn’t return the number itself, but that is an easy fix if you want it. The output also isn’t sorted.

I timed it running 10000 times on all numbers 1-200 and 100 times on all numbers 1-5000. It outperforms all the other versions I tested, including dansalmo’s, Jason Schorn’s, oxrock’s, agf’s, steveha’s, and eryksun’s solutions, though oxrock’s is by far the closest.


回答 15

您的最大因数不超过您的数字,所以,假设

def factors(n):
    factors = []
    for i in range(1, n//2+1):
        if n % i == 0:
            factors.append (i)
    factors.append(n)

    return factors

瞧!

your max factor is not more than your number, so, let’s say

def factors(n):
    factors = []
    for i in range(1, n//2+1):
        if n % i == 0:
            factors.append (i)
    factors.append(n)

    return factors

voilá!


回答 16

 import math

    '''
    I applied finding prime factorization to solve this. (Trial Division)
    It's not complicated
    '''


    def generate_factors(n):
        lower_bound_check = int(math.sqrt(n))  # determine lowest bound divisor range [16 = 4]
        factors = set()  # store factors
        for divisors in range(1, lower_bound_check + 1):  # loop [1 .. 4]
            if n % divisors == 0:
                factors.add(divisors)  # lower bound divisor is found 16 [ 1, 2, 4]
                factors.add(n // divisors)  # get upper divisor from lower [ 16 / 1 = 16, 16 / 2 = 8, 16 / 4 = 4]
        return factors  # [1, 2, 4, 8 16]


    print(generate_factors(12)) # {1, 2, 3, 4, 6, 12} -> pycharm output

 Pierre Vriens hopefully this makes more sense. this is an O(nlogn) solution. 
 import math

    '''
    I applied finding prime factorization to solve this. (Trial Division)
    It's not complicated
    '''


    def generate_factors(n):
        lower_bound_check = int(math.sqrt(n))  # determine lowest bound divisor range [16 = 4]
        factors = set()  # store factors
        for divisors in range(1, lower_bound_check + 1):  # loop [1 .. 4]
            if n % divisors == 0:
                factors.add(divisors)  # lower bound divisor is found 16 [ 1, 2, 4]
                factors.add(n // divisors)  # get upper divisor from lower [ 16 / 1 = 16, 16 / 2 = 8, 16 / 4 = 4]
        return factors  # [1, 2, 4, 8 16]


    print(generate_factors(12)) # {1, 2, 3, 4, 6, 12} -> pycharm output

 Pierre Vriens hopefully this makes more sense. this is an O(nlogn) solution. 

回答 17

使用与以下列表推导一样简单的方法,请注意,我们不需要测试1和我们要查找的数字:

def factors(n):
    return [x for x in range(2, n//2+1) if n%x == 0]

关于平方根的使用,假设我们要查找10的因数。sqrt(10) = 4因此range(1, int(sqrt(10))) = [1, 2, 3, 4],求4 的整数部分显然未命中5。

除非我想念什么,否则我建议您使用,如果您必须这样做int(ceil(sqrt(x)))。当然,这会产生很多不必要的函数调用。

Use something as simple as the following list comprehension, noting that we do not need to test 1 and the number we are trying to find:

def factors(n):
    return [x for x in range(2, n//2+1) if n%x == 0]

In reference to the use of square root, say we want to find factors of 10. The integer portion of the sqrt(10) = 4 therefore range(1, int(sqrt(10))) = [1, 2, 3, 4] and testing up to 4 clearly misses 5.

Unless I am missing something I would suggest, if you must do it this way, using int(ceil(sqrt(x))). Of course this produces a lot of unnecessary calls to functions.


回答 18

我认为为了提高可读性和速度,@ oxrock的解决方案是最好的,所以这里是为python 3+重写的代码:

def num_factors(n):
    results = set()
    for i in range(1, int(n**0.5) + 1):
        if n % i == 0: results.update([i,int(n/i)])
    return results

I think for readability and speed @oxrock’s solution is the best, so here is the code rewritten for python 3+:

def num_factors(n):
    results = set()
    for i in range(1, int(n**0.5) + 1):
        if n % i == 0: results.update([i,int(n/i)])
    return results

回答 19

当我看到这个问题,即使numpy 比python循环快得多的时候,也没有人使用numpy,我感到非常惊讶。通过使用numpy实现@agf的解决方案,结果平均8倍。我相信,如果您以numpy实施其他一些解决方案,您将获得美好的时光。

这是我的功能:

import numpy as np
def b(n):
    r = np.arange(1, int(n ** 0.5) + 1)
    x = r[np.mod(n, r) == 0]
    return set(np.concatenate((x, n / x), axis=None))   

请注意,x轴的数字不是功能的输入。函数的输入为2,x轴上的数字为负1。因此,输入十是2 ** 10-1 = 1023

使用numpy代替for循环的性能测试结果。

I was pretty surprised when I saw this question that no one used numpy even when numpy is way faster than python loops. By implementing @agf’s solution with numpy and it turned out at average 8x faster. I belive that if you implemented some of the other solutions in numpy you could get amazing times.

Here is my function:

import numpy as np
def b(n):
    r = np.arange(1, int(n ** 0.5) + 1)
    x = r[np.mod(n, r) == 0]
    return set(np.concatenate((x, n / x), axis=None))   

Notice that the numbers of the x-axis are not the input to the functions. The input to the functions is 2 to the the number on the x-axis minus 1. So where ten is the input would be 2**10-1 = 1023

Performance test results of using numpy instead of for loops.


回答 20

import 'dart:math';
generateFactorsOfN(N){
  //determine lowest bound divisor range
  final lowerBoundCheck = sqrt(N).toInt();
  var factors = Set<int>(); //stores factors
  /**
   * Lets take 16:
   * 4 = sqrt(16)
   * start from 1 ...  4 inclusive
   * check mod 16 % 1 == 0?  set[1, (16 / 1)]
   * check mod 16 % 2 == 0?  set[1, (16 / 1) , 2 , (16 / 2)]
   * check mod 16 % 3 == 0?  set[1, (16 / 1) , 2 , (16 / 2)] -> unchanged
   * check mod 16 % 4 == 0?  set[1, (16 / 1) , 2 , (16 / 2), 4, (16 / 4)]
   *
   *  ******************* set is used to remove duplicate
   *  ******************* case 4 and (16 / 4) both equal to 4
   *  return factor set<int>.. this isn't ordered
   */

  for(var divisor = 1; divisor <= lowerBoundCheck; divisor++){
    if(N % divisor == 0){
      factors.add(divisor);
      factors.add(N ~/ divisor); // ~/ integer division 
    }
  }
  return factors;
}
import 'dart:math';
generateFactorsOfN(N){
  //determine lowest bound divisor range
  final lowerBoundCheck = sqrt(N).toInt();
  var factors = Set<int>(); //stores factors
  /**
   * Lets take 16:
   * 4 = sqrt(16)
   * start from 1 ...  4 inclusive
   * check mod 16 % 1 == 0?  set[1, (16 / 1)]
   * check mod 16 % 2 == 0?  set[1, (16 / 1) , 2 , (16 / 2)]
   * check mod 16 % 3 == 0?  set[1, (16 / 1) , 2 , (16 / 2)] -> unchanged
   * check mod 16 % 4 == 0?  set[1, (16 / 1) , 2 , (16 / 2), 4, (16 / 4)]
   *
   *  ******************* set is used to remove duplicate
   *  ******************* case 4 and (16 / 4) both equal to 4
   *  return factor set<int>.. this isn't ordered
   */

  for(var divisor = 1; divisor <= lowerBoundCheck; divisor++){
    if(N % divisor == 0){
      factors.add(divisor);
      factors.add(N ~/ divisor); // ~/ integer division 
    }
  }
  return factors;
}

回答 21

我认为这是最简单的方法:

    x = 23

    i = 1
    while i <= x:
      if x % i == 0:
        print("factor: %s"% i)
      i += 1

I reckon this is the simplest way to do that:

    x = 23

    i = 1
    while i <= x:
      if x % i == 0:
        print("factor: %s"% i)
      i += 1

如何在Python中有效比较两个无序列表(不是集合)?

问题:如何在Python中有效比较两个无序列表(不是集合)?

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]

a和b应该视为相等,因为它们具有完全相同的元素,只是顺序不同。

问题是,我的实际列表将由对象(我的类实例)组成,而不是整数。

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]

a & b should be considered equal, because they have exactly the same elements, only in different order.

The thing is, my actual lists will consist of objects (my class instances), not integers.


回答 0

O(n)Counter()方法最好(如果您的对象是可哈希的):

def compare(s, t):
    return Counter(s) == Counter(t)

O(n log n)sorted()方法是次佳的(如果对象是可排序的):

def compare(s, t):
    return sorted(s) == sorted(t)

O(n * n):如果对象既不可散列也不可排序,则可以使用相等性:

def compare(s, t):
    t = list(t)   # make a mutable copy
    try:
        for elem in s:
            t.remove(elem)
    except ValueError:
        return False
    return not t

O(n): The Counter() method is best (if your objects are hashable):

def compare(s, t):
    return Counter(s) == Counter(t)

O(n log n): The sorted() method is next best (if your objects are orderable):

def compare(s, t):
    return sorted(s) == sorted(t)

O(n * n): If the objects are neither hashable, nor orderable, you can use equality:

def compare(s, t):
    t = list(t)   # make a mutable copy
    try:
        for elem in s:
            t.remove(elem)
    except ValueError:
        return False
    return not t

回答 1

您可以对两者进行排序:

sorted(a) == sorted(b)

一个计数排序也可能是更有效(但它需要的对象是哈希的)。

>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

You can sort both:

sorted(a) == sorted(b)

A counting sort could also be more efficient (but it requires the object to be hashable).

>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

回答 2

如果知道项目总是可哈希的,则可以使用Counter()O(n),
如果知道项目总是可排序的,则可以使用sorted()O(n log n)。

在一般情况下,您不能依靠能够排序或拥有元素,因此您需要像这样的后备,不幸的是,O(n ^ 2)

len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)

In the general case you can’t rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)

len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

回答 3

最好的方法是对列表进行排序并进行比较。(Counter不能用于不可哈希的对象。)对于整数,这很简单:

sorted(a) == sorted(b)

使用任意对象会变得有些棘手。如果您关心对象的身份,即两个列表中是否存在相同的对象,则可以将该id()函数用作排序键。

sorted(a, key=id) == sorted(b, key==id)

(在Python 2.x中,您实际上不需要 key=参数,因为您可以将任何对象与任何对象进行比较。顺序是任意的,但很稳定,因此可以很好地实现此目的;对象的顺序无关紧要但是,在Python 3中,在很多情况下都不允许比较不同类型的对象-例如,您不能将字符串与整数进行比较-因此,如果有对象最好使用显式使用对象的ID。)

另一方面,如果要按比较列表中的对象,则首先需要定义“值”对对象的含义。然后,您将需要某种方式将其提供为键(对于Python 3,则为一致类型)。一种适用于许多任意对象的潜在方式是按其排序repr()。当然,这可能会浪费大量额外时间和内存来构建repr()大型列表等字符串。

sorted(a, key=repr) == sorted(b, key==repr)

如果对象都是您自己的类型,则可以__lt__()在它们上进行定义,以使对象知道如何将自身与其他对象进行比较。然后,您可以对它们进行排序,而不必担心key=参数。当然,您也可以定义__hash__()和使用Counter,这样会更快。

The best way to do this is by sorting the lists and comparing them. (Using Counter won’t work with objects that aren’t hashable.) This is straightforward for integers:

sorted(a) == sorted(b)

It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.

sorted(a, key=id) == sorted(b, key==id)

(In Python 2.x you don’t actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn’t matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances — for example, you can’t compare strings to integers — so if you will have objects of various types, best to explicitly use the object’s ID.)

If you want to compare the objects in the list by value, on the other hand, first you need to define what “value” means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.

sorted(a, key=repr) == sorted(b, key==repr)

If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.


回答 4

https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual

assertCountEqual(first,second,msg = None)

测试序列首先包含与第二序列相同的元素,而不管其顺序如何。否则,将生成一条错误消息,列出序列之间的差异。

比较第一个和第二个元素时,不会忽略重复的元素。它验证两个序列中每个元素的计数是否相同。等效于:assertEqual(Counter(list(first()),Counter(list(second)))),但也适用于不可哈希对象的序列。

3.2版中的新功能。

或2.7:https//docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual

If you have to do this in tests: https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual

assertCountEqual(first, second, msg=None)

Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.

Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.

New in version 3.2.

or in 2.7: https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual

Outside of tests I would recommend the Counter method.


回答 5

如果列表包含不可散列的项(例如对象列表),则可以使用Counter Class和id()函数,例如:

from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
    print("Lists a and b contain the same objects")

If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:

from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
    print("Lists a and b contain the same objects")

回答 6

我希望以下代码可以在您的情况下工作:-

if ((len(a) == len(b)) and
   (all(i in a for i in b))):
    print 'True'
else:
    print 'False'

这将确保两个列表ab中的所有元素都是相同的,而不管它们的顺序是否相同。

为了更好地理解,请参考我在这个问题上的回答

I hope the below piece of code might work in your case :-

if ((len(a) == len(b)) and
   (all(i in a for i in b))):
    print 'True'
else:
    print 'False'

This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.

For better understanding, refer to my answer in this question


回答 7

如果要在测试环境中进行比较,请使用assertCountEqual(a, b)py>=3.2)和assertItemsEqual(a, b)2.7<=py<3.2)。

也适用于不可哈希对象的序列。

If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).

Works on sequences of unhashable objects too.


回答 8

让a,b列出

def ass_equal(a,b):
try:
    map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
    if len(a) == 0: # if a is empty, means that b has removed them all
        return True 
except:
    return False # b failed to remove some items from a

无需将它们设为可散列或排序。

Let a,b lists

def ass_equal(a,b):
try:
    map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
    if len(a) == 0: # if a is empty, means that b has removed them all
        return True 
except:
    return False # b failed to remove some items from a

No need to make them hashable or sort them.


回答 9

使用该unittest模块可为您提供一种干净而标准的方法。

import unittest

test_object = unittest.TestCase()
test_object.assertCountEqual(a, b)

Using the unittest module gives you a clean and standard approach.

import unittest

test_object = unittest.TestCase()
test_object.assertCountEqual(a, b)

两个列表之间的组合?

问题:两个列表之间的组合?

已经有一段时间了,我无法将自己的头围绕着我尝试制定的算法。基本上,我有两个列表,并且想要获得两个列表的所有组合。

我可能没有解释正确,所以这里有个例子。

name = 'a', 'b'
number = 1, 2

在这种情况下的输出将是:

1.  A1 B2
2.  B1 A2

棘手的部分是,“名称”变量中的项目可能比“数字”变量中的项目更多(数字将始终等于或小于名称变量)。

我很困惑如何进行所有组合(是否嵌套到循环?),甚至在名称中的项目比数字列表中的项目多的情况下,对于将名称变量中的项目进行移位的逻辑更加困惑。

我不是最好的程序员,但是如果有人可以帮助我阐明实现这一目标的逻辑/算法,我想可以试一试。所以我只是停留在嵌套的循环上。

更新:

这是带有3个变量和2个数字的输出:

name = 'a', 'b', 'c'
number = 1, 2

输出:

1.  A1 B2
2.  B1 A2
3.  A1 C2
4.  C1 A2
5.  B1 C2
6.  C1 B2

I’m having trouble wrapping my head around a algorithm I’m try to implement. I have two lists and want to take particular combinations from the two lists.

Here’s an example.

names = 'a', 'b'
numbers = 1, 2

the output in this case would be:

[('a', 1), ('b', 2)]
[('b', 1), ('a', 2)]

I might have more names than numbers, i.e. len(names) >= len(numbers). Here’s an example with 3 names and 2 numbers:

names = 'a', 'b', 'c'
numbers = 1, 2

output:

[('a', 1), ('b', 2)]
[('b', 1), ('a', 2)]
[('a', 1), ('c', 2)]
[('c', 1), ('a', 2)]
[('b', 1), ('c', 2)]
[('c', 1), ('b', 2)]

回答 0

注意:此答案是针对上面提出的特定问题的。如果您来自Google,只是在寻找一种使用Python获得笛卡尔积的方法,itertools.product或者您可能正在寻找简单的列表理解方法-请参见其他答案。


假设len(list1) >= len(list2)。然后,你似乎需要的是采取长的所有排列len(list2)list1,并与列表2项匹配。在python中:

import itertools
list1=['a','b','c']
list2=[1,2]

[list(zip(x,list2)) for x in itertools.permutations(list1,len(list2))]

退货

[[('a', 1), ('b', 2)], [('a', 1), ('c', 2)], [('b', 1), ('a', 2)], [('b', 1), ('c', 2)], [('c', 1), ('a', 2)], [('c', 1), ('b', 2)]]

Note: This answer is for the specific question asked above. If you are here from Google and just looking for a way to get a Cartesian product in Python, itertools.product or a simple list comprehension may be what you are looking for – see the other answers.


Suppose len(list1) >= len(list2). Then what you appear to want is to take all permutations of length len(list2) from list1 and match them with items from list2. In python:

import itertools
list1=['a','b','c']
list2=[1,2]

[list(zip(x,list2)) for x in itertools.permutations(list1,len(list2))]

Returns

[[('a', 1), ('b', 2)], [('a', 1), ('c', 2)], [('b', 1), ('a', 2)], [('b', 1), ('c', 2)], [('c', 1), ('a', 2)], [('c', 1), ('b', 2)]]

回答 1

最简单的方法是使用itertools.product

a = ["foo", "melon"]
b = [True, False]
c = list(itertools.product(a, b))
>> [("foo", True), ("foo", False), ("melon", True), ("melon", False)]

The simplest way is to use itertools.product:

a = ["foo", "melon"]
b = [True, False]
c = list(itertools.product(a, b))
>> [("foo", True), ("foo", False), ("melon", True), ("melon", False)]

回答 2

可能比上面最简单的方法更简单:

>>> a = ["foo", "bar"]
>>> b = [1, 2, 3]
>>> [(x,y) for x in a for y in b]  # for a list
[('foo', 1), ('foo', 2), ('foo', 3), ('bar', 1), ('bar', 2), ('bar', 3)]
>>> ((x,y) for x in a for y in b)  # for a generator if you worry about memory or time complexity.
<generator object <genexpr> at 0x1048de850>

没有任何进口

May be simpler than the simplest one above:

>>> a = ["foo", "bar"]
>>> b = [1, 2, 3]
>>> [(x,y) for x in a for y in b]  # for a list
[('foo', 1), ('foo', 2), ('foo', 3), ('bar', 1), ('bar', 2), ('bar', 3)]
>>> ((x,y) for x in a for y in b)  # for a generator if you worry about memory or time complexity.
<generator object <genexpr> at 0x1048de850>

without any import


回答 3

我一直在寻找一个仅由唯一组合乘以自身的列表,该组合是作为此功能提供的。

import itertools
itertools.combinations(list, n_times)

这里作为Python文档 itertools的摘录,可能会帮助您找到所需的内容。

Combinatoric generators:

Iterator                                 | Results
-----------------------------------------+----------------------------------------
product(p, q, ... [repeat=1])            | cartesian product, equivalent to a 
                                         |   nested for-loop
-----------------------------------------+----------------------------------------
permutations(p[, r])                     | r-length tuples, all possible 
                                         |   orderings, no repeated elements
-----------------------------------------+----------------------------------------
combinations(p, r)                       | r-length tuples, in sorted order, no 
                                         |   repeated elements
-----------------------------------------+----------------------------------------
combinations_with_replacement(p, r)      | r-length tuples, in sorted order, 
                                         | with repeated elements
-----------------------------------------+----------------------------------------
product('ABCD', repeat=2)                | AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD
permutations('ABCD', 2)                  | AB AC AD BA BC BD CA CB CD DA DB DC
combinations('ABCD', 2)                  | AB AC AD BC BD CD
combinations_with_replacement('ABCD', 2) | AA AB AC AD BB BC BD CC CD DD

I was looking for a list multiplied by itself with only unique combinations, which is provided as this function.

import itertools
itertools.combinations(list, n_times)

Here as an excerpt from the Python docs on itertools That might help you find what your looking for.

Combinatoric generators:

Iterator                                 | Results
-----------------------------------------+----------------------------------------
product(p, q, ... [repeat=1])            | cartesian product, equivalent to a 
                                         |   nested for-loop
-----------------------------------------+----------------------------------------
permutations(p[, r])                     | r-length tuples, all possible 
                                         |   orderings, no repeated elements
-----------------------------------------+----------------------------------------
combinations(p, r)                       | r-length tuples, in sorted order, no 
                                         |   repeated elements
-----------------------------------------+----------------------------------------
combinations_with_replacement(p, r)      | r-length tuples, in sorted order, 
                                         | with repeated elements
-----------------------------------------+----------------------------------------
product('ABCD', repeat=2)                | AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD
permutations('ABCD', 2)                  | AB AC AD BA BC BD CA CB CD DA DB DC
combinations('ABCD', 2)                  | AB AC AD BC BD CD
combinations_with_replacement('ABCD', 2) | AA AB AC AD BB BC BD CC CD DD

回答 4

您可能想尝试一个单行列表理解:

>>> [name+number for name in 'ab' for number in '12']
['a1', 'a2', 'b1', 'b2']
>>> [name+number for name in 'abc' for number in '12']
['a1', 'a2', 'b1', 'b2', 'c1', 'c2']

You might want to try a one line list comprehension:

>>> [name+number for name in 'ab' for number in '12']
['a1', 'a2', 'b1', 'b2']
>>> [name+number for name in 'abc' for number in '12']
['a1', 'a2', 'b1', 'b2', 'c1', 'c2']

回答 5

找出大量列表的所有组合的最佳方法是:

import itertools
from pprint import pprint

inputdata = [
    ['a', 'b', 'c'],
    ['d'],
    ['e', 'f'],
]
result = list(itertools.product(*inputdata))
pprint(result)

结果将是:

[('a', 'd', 'e'),
 ('a', 'd', 'f'),
 ('b', 'd', 'e'),
 ('b', 'd', 'f'),
 ('c', 'd', 'e'),
 ('c', 'd', 'f')]

the best way to find out all the combinations for large number of lists is:

import itertools
from pprint import pprint

inputdata = [
    ['a', 'b', 'c'],
    ['d'],
    ['e', 'f'],
]
result = list(itertools.product(*inputdata))
pprint(result)

the result will be:

[('a', 'd', 'e'),
 ('a', 'd', 'f'),
 ('b', 'd', 'e'),
 ('b', 'd', 'f'),
 ('c', 'd', 'e'),
 ('c', 'd', 'f')]

回答 6

或KISS回答的简短清单:

[(i, j) for i in list1 for j in list2]

性能不如itertools,但您使用的是python,因此性能已经不是您的头等大事…

我也喜欢所有其他答案!

Or the KISS answer for short lists:

[(i, j) for i in list1 for j in list2]

Not as performant as itertools but you’re using python so performance is already not your top concern…

I like all the other answers too!


回答 7

对interjay答案的微小改进,使结果成为一个扁平的列表。

>>> list3 = [zip(x,list2) for x in itertools.permutations(list1,len(list2))]
>>> import itertools
>>> chain = itertools.chain(*list3)
>>> list4 = list(chain)
[('a', 1), ('b', 2), ('a', 1), ('c', 2), ('b', 1), ('a', 2), ('b', 1), ('c', 2), ('c', 1), ('a', 2), ('c', 1), ('b', 2)]

来自此链接的参考

a tiny improvement for the answer from interjay, to make the result as a flatten list.

>>> list3 = [zip(x,list2) for x in itertools.permutations(list1,len(list2))]
>>> import itertools
>>> chain = itertools.chain(*list3)
>>> list4 = list(chain)
[('a', 1), ('b', 2), ('a', 1), ('c', 2), ('b', 1), ('a', 2), ('b', 1), ('c', 2), ('c', 1), ('a', 2), ('c', 1), ('b', 2)]

reference from this link


回答 8

没有itertools

[(list1[i], list2[j]) for i in xrange(len(list1)) for j in xrange(len(list2))]

Without itertools

[(list1[i], list2[j]) for i in xrange(len(list1)) for j in xrange(len(list2))]

回答 9

回答问题“给定两个列表,从每个列表中找到一个项目对的所有可能的排列”,并使用基本的Python功能(即,没有itertools),因此可以轻松地复制其他编程语言:

def rec(a, b, ll, size):
    ret = []
    for i,e in enumerate(a):
        for j,f in enumerate(b):
            l = [e+f]
            new_l = rec(a[i+1:], b[:j]+b[j+1:], ll, size)
            if not new_l:
                ret.append(l)
            for k in new_l:
                l_k = l + k
                ret.append(l_k)
                if len(l_k) == size:
                    ll.append(l_k)
    return ret

a = ['a','b','c']
b = ['1','2']
ll = []
rec(a,b,ll, min(len(a),len(b)))
print(ll)

退货

[['a1', 'b2'], ['a1', 'c2'], ['a2', 'b1'], ['a2', 'c1'], ['b1', 'c2'], ['b2', 'c1']]

Answering the question “given two lists, find all possible permutations of pairs of one item from each list” and using basic Python functionality (i.e., without itertools) and, hence, making it easy to replicate for other programming languages:

def rec(a, b, ll, size):
    ret = []
    for i,e in enumerate(a):
        for j,f in enumerate(b):
            l = [e+f]
            new_l = rec(a[i+1:], b[:j]+b[j+1:], ll, size)
            if not new_l:
                ret.append(l)
            for k in new_l:
                l_k = l + k
                ret.append(l_k)
                if len(l_k) == size:
                    ll.append(l_k)
    return ret

a = ['a','b','c']
b = ['1','2']
ll = []
rec(a,b,ll, min(len(a),len(b)))
print(ll)

Returns

[['a1', 'b2'], ['a1', 'c2'], ['a2', 'b1'], ['a2', 'c1'], ['b1', 'c2'], ['b2', 'c1']]

回答 10

更好的答案仅适用于所提供列表的特定长度。

这是一个适用于任何长度输入的版本。就组合和置换的数学概念而言,这也使算法更加清晰。

from itertools import combinations, permutations
list1 = ['1', '2']
list2 = ['A', 'B', 'C']

num_elements = min(len(list1), len(list2))
list1_combs = list(combinations(list1, num_elements))
list2_perms = list(permutations(list2, num_elements))
result = [
  tuple(zip(perm, comb))
  for comb in list1_combs
  for perm in list2_perms
]

for idx, ((l11, l12), (l21, l22)) in enumerate(result):
  print(f'{idx}: {l11}{l12} {l21}{l22}')

输出:

0: A1 B2
1: A1 C2
2: B1 A2
3: B1 C2
4: C1 A2
5: C1 B2

The better answers to this only work for specific lengths of lists that are provided.

Here’s a version that works for any lengths of input. It also makes the algorithm clear in terms of the mathematical concepts of combination and permutation.

from itertools import combinations, permutations
list1 = ['1', '2']
list2 = ['A', 'B', 'C']

num_elements = min(len(list1), len(list2))
list1_combs = list(combinations(list1, num_elements))
list2_perms = list(permutations(list2, num_elements))
result = [
  tuple(zip(perm, comb))
  for comb in list1_combs
  for perm in list2_perms
]

for idx, ((l11, l12), (l21, l22)) in enumerate(result):
  print(f'{idx}: {l11}{l12} {l21}{l22}')

This outputs:

0: A1 B2
1: A1 C2
2: B1 A2
3: B1 C2
4: C1 A2
5: C1 B2