滚动或滑动窗口迭代器?

问题:滚动或滑动窗口迭代器?

我需要在序列/迭代器/生成器上可迭代的滚动窗口(又称滑动窗口)。可以将默认Python迭代视为一种特殊情况,其中窗口长度为1。我当前正在使用以下代码。有没有人有更多的Python风格,更少的冗长或更有效的方法来执行此操作?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:

   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. Default Python iteration can be considered a special case, where the window length is 1. I’m currently using the following code. Does anyone have a more Pythonic, less verbose, or more efficient method for doing this?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:

   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

回答 0

Python文档的旧版本中有一个带有itertools示例

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

文档中的一个更加简洁,itertools我想它可以起到更大的作用。

There’s one in an old version of the Python docs with itertools examples:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

The one from the docs is a little more succinct and uses itertools to greater effect I imagine.


回答 1

这似乎是为a量身定制的,collections.deque因为您实际上具有FIFO(添加到一端,从另一端移除)。但是,即使使用a list,也不应切片两次;相反,您可能应该只pop(0)从列表和append()新项目开始。

这是一个优化的基于双端队列的实现,该实现以原始格式为基础:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

在我的测试中,它在大多数情况下都轻而易举地击败了此处发布的所有其他内容,尽管pillmuncher的tee版本在大型可迭代项和小窗口方面均胜过了它。在较大的窗户上,deque原始速度再次向前拉。

deque与中的列表或元组相比,对中单个项目的访问可能更快或更慢。(如果使用负索引,则开头附近的项目会更快,或者结尾附近的项目会更快。)我将a sum(w)放入循环的主体中;这会影响双端队列的强度(从一个项目到另一个项目的迭代速度很快,因此此循环比第二个最快的方法pillmuncher的运行速度快了整整20%)。当我将其更改为单独查找并在10个窗口中添加项目时,表格翻转了,该tee方法快了20%。通过使用最后五个词的负索引,我能够恢复一定的速度,但tee速度仍然要快一些。总的来说,我估计这两种方法对于大多数用途来说都足够快,如果您需要更多性能,请配置并选择最有效的方法。

This seems tailor-made for a collections.deque since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a list you shouldn’t be slicing twice; instead, you should probably just pop(0) from the list and append() the new item.

Here is an optimized deque-based implementation patterned after your original:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

In my tests it handily beats everything else posted here most of the time, though pillmuncher’s tee version beats it for large iterables and small windows. On larger windows, the deque pulls ahead again in raw speed.

Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque’s strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher’s). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.


回答 2

我喜欢tee()

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(xrange(6), 3):
    print list(each)

给出:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

I like tee():

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(xrange(6), 3):
    print list(each)

gives:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

回答 3

下面是添加了对泛化stepfillvalue参数:

from collections import deque
from itertools import islice

def sliding_window(iterable, size=2, step=1, fillvalue=None):
    if size < 0 or step < 1:
        raise ValueError
    it = iter(iterable)
    q = deque(islice(it, size), maxlen=size)
    if not q:
        return  # empty iterable or size == 0
    q.extend(fillvalue for _ in range(size - len(q)))  # pad to size
    while True:
        yield iter(q)  # iter() to avoid accidental outside modifications
        try:
            q.append(next(it))
        except StopIteration: # Python 3.5 pep 479 support
            return
        q.extend(next(it, fillvalue) for _ in range(step - 1))

每次迭代size时,它会在滚动step位置一次生成块项目,并fillvalue在必要时填充每个块。示例size=4, step=3, fillvalue='*'

 [a b c d]e f g h i j k l m n o p q r s t u v w x y z
  a b c[d e f g]h i j k l m n o p q r s t u v w x y z
  a b c d e f[g h i j]k l m n o p q r s t u v w x y z
  a b c d e f g h i[j k l m]n o p q r s t u v w x y z
  a b c d e f g h i j k l[m n o p]q r s t u v w x y z
  a b c d e f g h i j k l m n o[p q r s]t u v w x y z
  a b c d e f g h i j k l m n o p q r[s t u v]w x y z
  a b c d e f g h i j k l m n o p q r s t u[v w x y]z
  a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]

有关该step参数用例的示例,请参阅有效地在python中处理大型.txt文件

Here’s a generalization that adds support for step, fillvalue parameters:

from collections import deque
from itertools import islice

def sliding_window(iterable, size=2, step=1, fillvalue=None):
    if size < 0 or step < 1:
        raise ValueError
    it = iter(iterable)
    q = deque(islice(it, size), maxlen=size)
    if not q:
        return  # empty iterable or size == 0
    q.extend(fillvalue for _ in range(size - len(q)))  # pad to size
    while True:
        yield iter(q)  # iter() to avoid accidental outside modifications
        try:
            q.append(next(it))
        except StopIteration: # Python 3.5 pep 479 support
            return
        q.extend(next(it, fillvalue) for _ in range(step - 1))

It yields in chunks size items at a time rolling step positions per iteration padding each chunk with fillvalue if necessary. Example for size=4, step=3, fillvalue='*':

 [a b c d]e f g h i j k l m n o p q r s t u v w x y z
  a b c[d e f g]h i j k l m n o p q r s t u v w x y z
  a b c d e f[g h i j]k l m n o p q r s t u v w x y z
  a b c d e f g h i[j k l m]n o p q r s t u v w x y z
  a b c d e f g h i j k l[m n o p]q r s t u v w x y z
  a b c d e f g h i j k l m n o[p q r s]t u v w x y z
  a b c d e f g h i j k l m n o p q r[s t u v]w x y z
  a b c d e f g h i j k l m n o p q r s t u[v w x y]z
  a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]

For an example of use case for the step parameter, see Processing a large .txt file in python efficiently.


回答 4

有一个库可以完全满足您的需求:

import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))

Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

There is a library which does exactly what you need:

import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))

Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

回答 5

只是一个快速的贡献。

由于当前的python文档在itertool示例中没有“窗口”(即,位于http://docs.python.org/library/itertools.html的底部),因此下面是基于grouper代码的代码段,是给出的示例之一:

import itertools as it
def window(iterable, size):
    shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
    return it.izip(*shiftedStarts)

基本上,我们创建了一系列切片式迭代器,每个迭代器的起点都向前一点。然后,将它们压缩在一起。注意,此函数返回一个生成器(它本身不是直接生成器)。

就像上面的appending-element和advancing-iterator版本一样,性能(即最佳)随列表大小和窗口大小而变化。我喜欢这一行,因为它是两行的(可能是一行,但我更喜欢命名概念)。

原来,以上代码是错误的。如果传递给iterable的参数是一个序列,则有效,但如果它是迭代器,则无效。如果它是一个迭代器,则在islice调用之间共享同一迭代器(但不发送),这会严重破坏事情。

这是一些固定的代码:

import itertools as it
def window(iterable, size):
    itrs = it.tee(iterable, size)
    shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
    return it.izip(*shiftedStarts)

此外,还有其他版本的书籍。该版本没有复制一个迭代器,而是先进行多次复制,而是随着我们向前移动起始位置,对每个迭代器进行成对复制。因此,迭代器t为“完整”的迭代器提供了从t的起点,并且为创建迭代器t + 1提供了基础:

import itertools as it
def window4(iterable, size):
    complete_itr, incomplete_itr = it.tee(iterable, 2)
    iters = [complete_itr]
    for i in xrange(1, size):
        incomplete_itr.next()
        complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
        iters.append(complete_itr)
    return it.izip(*iters)

Just a quick contribution.

Since the current python docs don’t have “window” in the itertool examples (i.e., at the bottom of http://docs.python.org/library/itertools.html), here’s an snippet based on the code for grouper which is one of the examples given:

import itertools as it
def window(iterable, size):
    shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
    return it.izip(*shiftedStarts)

Basically, we create a series of sliced iterators, each with a starting point one spot further forward. Then, we zip these together. Note, this function returns a generator (it is not directly a generator itself).

Much like the appending-element and advancing-iterator versions above, the performance (i.e., which is best) varies with list size and window size. I like this one because it is a two-liner (it could be a one-liner, but I prefer naming concepts).

It turns out that the above code is wrong. It works if the parameter passed to iterable is a sequence but not if it is an iterator. If it is an iterator, the same iterator is shared (but not tee’d) among the islice calls and this breaks things badly.

Here is some fixed code:

import itertools as it
def window(iterable, size):
    itrs = it.tee(iterable, size)
    shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
    return it.izip(*shiftedStarts)

Also, one more version for the books. Instead of copying an iterator and then advancing copies many times, this version makes pairwise copies of each iterator as we move the starting position forward. Thus, iterator t provides both the “complete” iterator with starting point at t and also the basis for creating iterator t + 1:

import itertools as it
def window4(iterable, size):
    complete_itr, incomplete_itr = it.tee(iterable, 2)
    iters = [complete_itr]
    for i in xrange(1, size):
        incomplete_itr.next()
        complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
        iters.append(complete_itr)
    return it.izip(*iters)

回答 6

只是为了展示如何结合itertools的食谱,我延长了pairwise配方尽可能直接回window用的配方consume配方:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

def window(iterable, n=2):
    "s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
    iters = tee(iterable, n)
    # Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
    # slower for larger window sizes, while saving only small fixed "noop" cost
    for i, it in enumerate(iters):
        consume(it, i)
    return zip(*iters)

window配方是一样的pairwise,它只是代替在第二的单个元素“消费” tee-ed迭代器上逐步增加消耗n - 1迭代器。使用(consume而不是包装每个迭代器islice)速度稍快(对于足够大的可迭代对象),因为您只isliceconsume阶段中支付包装开销,而不是在提取每个窗口值的过程中支付包装开销(因此,它受限制n,而不是其中的项目数限制)iterable)。

在性能方面,与其他一些解决方案相比,这是相当不错的(并且比我测试过的其他任何解决方案都要好)。使用ipython %timeit魔术在Python 3.5.0,Linux x86-64上进行了测试。

kindall的deque解决方案,通过使用islice而不是原始生成器表达式来调整性能/正确性,并测试了结果长度,以便当iterable的长度小于window时,以及maxlendeque位置传递而不是在按关键字(对于较小的输入产生令人惊讶的差异):

>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop

与以前采用的kindall解决方案相同,但都进行了yield win更改,yield tuple(win)因此生成器的存储结果可以正常工作,而所有存储的结果实际上都不是最新结果的视图(在这种情况下,所有其他合理的解决方案都是安全的),并添加tuple=tuple到函数定义中移动应用tupleBLEGBL

>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop

consume上面显示的基于解决方案:

>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop

与相同consume,但内联的else情况是consume避免函数调用和n is None测试以减少运行时间,尤其是对于小型输入(其中设置开销是工作中有意义的一部分)的情况:

>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop

(旁注:其上的变体pairwise使用tee默认参数2重复制作嵌套tee对象,因此任何给定的迭代器仅前进一次,而不是独立消耗越来越多的次数,类似于MrDrFenner的答案类似于非内联的consume并且比consume所有测试中的内联速度慢,因此为简洁起见,我省略了这些结果)。

如您所见,如果您不关心调用方是否需要存储结果,我的kindall解决方案的优化版本在大多数情况下都会获胜,除非是在“大迭代,小窗口大小的情况下”(内联consume获胜) ); 随着可迭代大小的增加,它会迅速降级,而随着窗口大小的增加,它根本不会降级(其他解决方案随着可迭代大小的增加而降级得更慢,但随着窗口大小的增加而降级)。它甚至可以通过包裹入来适应“需要元组”的情况map(tuple, ...),它的运行速度比将小结添加到函数中的速度稍慢,但它是微不足道的(需要花费1-5%的时间),并让您保持运行速度更快的灵活性当您可以容忍重复返回相同的值时。

如果您需要安全地防止退货被存储,则内联consume赢得所有输入最小输入大小)(非内联consume则稍慢一些,但缩放比例类似)。基于deque&捆绑的解决方案仅因最小的安装成本而获得了成功,因为安装成本较小,并且收益很小。随着迭代时间的延长,它的性能将严重下降。

为了记录在案,kindall的解决方案,它的改编版yield小号tuple的I所用的是:

def windowkindalltupled(iterable, n=2, tuple=tuple):
    it = iter(iterable)
    win = deque(islice(it, n), n)
    if len(win) < n:
        return
    append = win.append
    yield tuple(win)
    for e in it:
        append(e)
        yield tuple(win)

删除tuple函数定义行中的缓存并tuple在每个函数中使用yield以获得更快但不太安全的版本。

Just to show how you can combine itertools recipes, I’m extending the pairwise recipe as directly as possible back into the window recipe using the consume recipe:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

def window(iterable, n=2):
    "s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
    iters = tee(iterable, n)
    # Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
    # slower for larger window sizes, while saving only small fixed "noop" cost
    for i, it in enumerate(iters):
        consume(it, i)
    return zip(*iters)

The window recipe is the same as for pairwise, it just replaces the single element “consume” on the second tee-ed iterator with progressively increasing consumes on n - 1 iterators. Using consume instead of wrapping each iterator in islice is marginally faster (for sufficiently large iterables) since you only pay the islice wrapping overhead during the consume phase, not during the process of extracting each window-ed value (so it’s bounded by n, not the number of items in iterable).

Performance-wise, compared to some other solutions, this is pretty good (and better than any of the other solutions I tested as it scales). Tested on Python 3.5.0, Linux x86-64, using ipython %timeit magic.

kindall’s the deque solution, tweaked for performance/correctness by using islice instead of a home-rolled generator expression and testing the resulting length so it doesn’t yield results when the iterable is shorter than the window, as well as passing the maxlen of the deque positionally instead of by keyword (makes a surprising difference for smaller inputs):

>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop

Same as previous adapted kindall solution, but with each yield win changed to yield tuple(win) so storing results from the generator works without all stored results really being a view of the most recent result (all other reasonable solutions are safe in this scenario), and adding tuple=tuple to the function definition to move use of tuple from the B in LEGB to the L:

>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop

consume-based solution shown above:

>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop

Same as consume, but inlining else case of consume to avoid function call and n is None test to reduce runtime, particularly for small inputs where the setup overhead is a meaningful part of the work:

>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop

(Side-note: A variant on pairwise that uses tee with the default argument of 2 repeatedly to make nested tee objects, so any given iterator is only advanced once, not independently consumed an increasing number of times, similar to MrDrFenner’s answer is similar to non-inlined consume and slower than the inlined consume on all tests, so I’ve omitted it those results for brevity).

As you can see, if you don’t care about the possibility of the caller needing to store results, my optimized version of kindall’s solution wins most of the time, except in the “large iterable, small window size case” (where inlined consume wins); it degrades quickly as the iterable size increases, while not degrading at all as the window size increases (every other solution degrades more slowly for iterable size increases, but also degrades for window size increases). It can even be adapted for the “need tuples” case by wrapping in map(tuple, ...), which runs ever so slightly slower than putting the tupling in the function, but it’s trivial (takes 1-5% longer) and lets you keep the flexibility of running faster when you can tolerate repeatedly returning the same value.

If you need safety against returns being stored, inlined consume wins on all but the smallest input sizes (with non-inlined consume being slightly slower but scaling similarly). The deque & tupling based solution wins only for the smallest inputs, due to smaller setup costs, and the gain is small; it degrades badly as the iterable gets longer.

For the record, the adapted version of kindall’s solution that yields tuples I used was:

def windowkindalltupled(iterable, n=2, tuple=tuple):
    it = iter(iterable)
    win = deque(islice(it, n), n)
    if len(win) < n:
        return
    append = win.append
    yield tuple(win)
    for e in it:
        append(e)
        yield tuple(win)

Drop the caching of tuple in the function definition line and the use of tuple in each yield to get the faster but less safe version.


回答 7

我将以下代码用作一个简单的滑动窗口,该窗口使用生成器来大大提高可读性。根据我的经验,到目前为止,它的速度已经足够用于生物信息学序列分析。

我将其包含在此处是因为我尚未看到使用此方法。同样,我对它的比较性能没有任何要求。

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]

I use the following code as a simple sliding window that uses generators to drastically increase readability. Its speed has so far been sufficient for use in bioinformatics sequence analysis in my experience.

I include it here because I didn’t see this method used yet. Again, I make no claims about its compared performance.

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]

回答 8

def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]

回答 9

双端队列窗口的略微修改版本,以使其成为真正的滚动窗口。这样它就开始只用一个元素填充,然后增长到最大窗口大小,然后随着左边缘的临近而缩小:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

这给

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]

a slightly modified version of the deque window, to make it a true rolling window. So that it starts being populated with just one element, then grows to it’s maximum window size, and then shrinks as it’s left edge comes near the end:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

this gives

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]

回答 10

def rolling_window(list, degree):
    for i in range(len(list)-degree+1):
        yield [list[i+o] for o in range(degree)]

进行滚动平均功能

def rolling_window(list, degree):
    for i in range(len(list)-degree+1):
        yield [list[i+o] for o in range(degree)]

Made this for a rolling average function


回答 11

为什么不

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

它记录在Python doc中。您可以轻松地将其扩展到更大的窗口。

why not

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

It is documented in Python doc . You can easily extend it to wider window.


回答 12

多个迭代器!

def window(seq, size, step=1):
    # initialize iterators
    iters = [iter(seq) for i in range(size)]
    # stagger iterators (without yielding)
    [next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
    while(True):
        yield [next(i) for i in iters]
        # next line does nothing for step = 1 (skips iterations for step > 1)
        [next(i) for i in iters for j in range(step-1)]

next(it)StopIteration当序列完成时引发,并且由于超出我的一些凉爽原因,此处的yield语句除外,并且函数返回,而忽略了未形成完整窗口的剩余值。

无论如何,这是最低线的解决方案还没有,其唯一的要求是seq实现无论是__iter____getitem__和不依赖于itertoolscollections除了@ dansalmo的解决方案:)

Multiple iterators!

def window(seq, size, step=1):
    # initialize iterators
    iters = [iter(seq) for i in range(size)]
    # stagger iterators (without yielding)
    [next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
    while(True):
        yield [next(i) for i in iters]
        # next line does nothing for step = 1 (skips iterations for step > 1)
        [next(i) for i in iters for j in range(step-1)]

next(it) raises StopIteration when the sequence is finished, and for some cool reason that’s beyond me, the yield statement here excepts it and the function returns, ignoring the leftover values that don’t form a full window.

Anyway, this is the least-lines solution yet whose only requirement is that seq implement either __iter__ or __getitem__ and doesn’t rely on itertools or collections besides @dansalmo’s solution :)


回答 13

让我们变得懒惰吧!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]

Let’s make it lazy!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]

回答 14

#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size), 
strides = arr.strides*2)

"""Example output:

  [0, 1, 2]
  [1, 2, 3]
  [2, 3, 4]
  [3, 4, 5]

“”

#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size), 
strides = arr.strides*2)

"""Example output:

  [0, 1, 2]
  [1, 2, 3]
  [2, 3, 4]
  [3, 4, 5]

“””


回答 15

我测试了一些解决方案,然后提出了一个解决方案,发现我提出的解决方案是最快的,因此我想与大家分享。

import itertools
import sys

def windowed(l, stride):
    return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])

I tested a few solutions and one I came up with and found the one I came up with to be the fastest so I thought I would share it.

import itertools
import sys

def windowed(l, stride):
    return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])

回答 16

>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

回答 17

如何使用以下内容:

mylist = [1, 2, 3, 4, 5, 6, 7]

def sliding_window(l, window_size=2):
    if window_size > len(l):
        raise ValueError("Window size must be smaller or equal to the number of elements in the list.")

    t = []
    for i in xrange(0, window_size):
        t.append(l[i:])

    return zip(*t)

print sliding_window(mylist, 3)

输出:

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]

How about using the following:

mylist = [1, 2, 3, 4, 5, 6, 7]

def sliding_window(l, window_size=2):
    if window_size > len(l):
        raise ValueError("Window size must be smaller or equal to the number of elements in the list.")

    t = []
    for i in xrange(0, window_size):
        t.append(l[i:])

    return zip(*t)

print sliding_window(mylist, 3)

Output:

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]

回答 18

这是一个古老的问题,但是对于那些仍然感兴趣的人来说,在页面中使用生成器可以很好地实现窗口滑块的实现(作者:Adrian Rosebrock)。

它是OpenCV的一种实现,但是您可以轻松地将其用于其他目的。对于急切的人,我将代码粘贴在这里,但是为了更好地理解它,我建议访问原始页面。

def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

提示:.shape迭代生成器时,您可以检查窗口的,以丢弃不符合您要求的窗口

干杯

This is an old question but for those still interested there is a great implementation of a window slider using generators in this page (by Adrian Rosebrock).

It is an implementation for OpenCV however you can easily use it for any other purpose. For the eager ones i’ll paste the code here but to understand it better I recommend visiting the original page.

def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

Tip: You can check the .shape of the window when iterating the generator to discard those that do not meet your requirements

Cheers


回答 19

修改了DiPaolo的答案以允许任意填充和可变步长

import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(itertools.islice(it, n))    
    if len(result) == n:
        yield result
    while True:        
#         for elem in it:        
        elem = tuple( next(it, fill) for _ in range(step))
        result = result[step:] + elem        
        if elem[-1] is fill:
            if keep:
                yield result
            break
        yield result

Modified DiPaolo’s answer to allow arbitrary fill and variable step size

import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(itertools.islice(it, n))    
    if len(result) == n:
        yield result
    while True:        
#         for elem in it:        
        elem = tuple( next(it, fill) for _ in range(step))
        result = result[step:] + elem        
        if elem[-1] is fill:
            if keep:
                yield result
            break
        yield result

回答 20

这是一行代码。我对它进行了计时,它与最佳答案的性能相当,并随着seq的增大而逐渐变好,len(seq)= 20时变慢了20%,len(seq)= 10000时变慢了7%

zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])

here is a one liner. I timed it and it’s comprable to the performance of the top answer and gets progressively better with larger seq from 20% slower with len(seq) = 20 and 7% slower with len(seq) = 10000

zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])

回答 21

使用islice尝试我的简单,一种衬里,pythonic方式。但是,可能不是最佳效率。

from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

说明:通过使用window_size的islice创建窗口,并使用map在所有数组上进行迭代。

Trying my part, simple, one liner, pythonic way using islice. But, may not be optimally efficient.

from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

Explanation: Create window by using islice of window_size and iterate this operation using map over all array.


回答 22

深度学习中滑动窗口数据的优化功能

def SlidingWindow(X, window_length, stride):
    indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
    return X.take(indexer)

Optimized Function for sliding window data in Deep learning

def SlidingWindow(X, window_length, stride):
    indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
    return X.take(indexer)

to apply on multidimensional array

import numpy as np
def SlidingWindow(X, window_length, stride1):
    stride=  X.shape[1]*stride1
    window_length = window_length*X.shape[1]
    indexer = np.arange(window_length)[None, :] + stride1*np.arange(int(len(X)/stride1)-window_length-1)[:, None]
    return X.take(indexer)