首次出现大于当前值的值

Question 1

I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.

E.g.

aa = range(-10,10)

Find position in aa where, the value 5 gets exceeded.

Question 2

This is a little faster (and looks nicer)

np.argmax(aa>5)

Since will stop at the first True (“In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.”) and doesn’t save another list.

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

Question 3

given the sorted content of your array, there is an even faster method: searchsorted.

import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]

# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop

Question 4

I was also interested in this and I’ve compared all the suggested answers with perfplot. (Disclaimer: I’m the author of perfplot.)

If you know that the array you’re looking through is already sorted, then

numpy.searchsorted(a, alpha)

is for you. It’s O(log(n)) operation, i.e., the speed hardly depends on the size of the array. You can’t get faster than that.

If you don’t know anything about your array, you’re not going wrong with

numpy.argmax(a > alpha)

Already sorted:

Unsorted:

Code to reproduce the plot:

import numpy
import perfplot


alpha = 0.5
numpy.random.seed(0)


def argmax(data):
    return numpy.argmax(data > alpha)


def where(data):
    return numpy.where(data > alpha)[0][0]


def nonzero(data):
    return numpy.nonzero(data > alpha)[0][0]


def searchsorted(data):
    return numpy.searchsorted(data, alpha)


perfplot.save(
    "out.png",
    # setup=numpy.random.rand,
    setup=lambda n: numpy.sort(numpy.random.rand(n)),
    kernels=[argmax, where, nonzero, searchsorted],
    n_range=[2 ** k for k in range(2, 23)],
    xlabel="len(array)",
)

Question 5

In [34]: a=np.arange(-10,10)

In [35]: a
Out[35]:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

In [36]: np.where(a>5)
Out[36]: (array([16, 17, 18, 19]),)

In [37]: np.where(a>5)[0][0]
Out[37]: 16

Question 6

Arrays that have a constant step between elements

In case of a range or any other linearly increasing array you can simply calculate the index programmatically, no need to actually iterate over the array at all:

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('no value greater than {}'.format(val))
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    # For linearly decreasing arrays or constant arrays we only need to check
    # the first element, because if that does not satisfy the condition
    # no other element will.
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

One could probably improve that a bit. I have made sure it works correctly for a few sample arrays and values but that doesn’t mean there couldn’t be mistakes in there, especially considering that it uses floats…

>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)[16]  # double check
6

>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15

Given that it can calculate the position without any iteration it will be constant time (O(1)) and can probably beat all other mentioned approaches. However it requires a constant step in the array, otherwise it will produce wrong results.

General solution using numba

A more general approach would be using a numba function:

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

That will work for any array but it has to iterate over the array, so in the average case it will be O(n):

>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16

Benchmark

Even though Nico Schlömer already provided some benchmarks I thought it might be useful to include my new solutions and to test for different “values”.

The test setup:

import numpy as np
import math
import numba as nb

def first_index_using_argmax(val, arr):
    return np.argmax(arr > val)

def first_index_using_where(val, arr):
    return np.where(arr > val)[0][0]

def first_index_using_nonzero(val, arr):
    return np.nonzero(arr > val)[0][0]

def first_index_using_searchsorted(val, arr):
    return np.searchsorted(arr, val) + 1

def first_index_using_min(val, arr):
    return np.min(np.where(arr > val))

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('empty array')
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

funcs = [
    first_index_using_argmax, 
    first_index_using_min, 
    first_index_using_nonzero,
    first_index_calculate_range_like, 
    first_index_numba, 
    first_index_using_searchsorted, 
    first_index_using_where
]

from simple_benchmark import benchmark, MultiArgument

and the plots were generated using:

%matplotlib notebook
b.plot()

item is at the beginning

b = benchmark(
    funcs,
    {2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

The numba function performs best followed by the calculate-function and the searchsorted function. The other solutions perform much worse.

item is at the end

b = benchmark(
    funcs,
    {2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

For small arrays the numba function performs amazingly fast, however for bigger arrays it’s outperformed by the calculate-function and the searchsorted function.

item is at sqrt(len)

b = benchmark(
    funcs,
    {2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

This is more interesting. Again numba and the calculate function perform great, however this is actually triggering the worst case of searchsorted which really doesn’t work well in this case.

Comparison of the functions when no value satisfies the condition

Another interesting point is how these function behave if there is no value whose index should be returned:

arr = np.ones(100)
value = 2

for func in funcs:
    print(func.__name__)
    try:
        print('-->', func(value, arr))
    except Exception as e:
        print('-->', e)

With this result:

first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0

Searchsorted, argmax, and numba simply return a wrong value. However searchsorted and numba return an index that is not a valid index for the array.

The functions where, min, nonzero and calculate throw an exception. However only the exception for calculate actually says anything helpful.

That means one actually has to wrap these calls in an appropriate wrapper function that catches exceptions or invalid return values and handle appropriately, at least if you aren’t sure if the value could be in the array.

Note: The calculate and searchsorted options only work in special conditions. The “calculate” function requires a constant step and the searchsorted requires the array to be sorted. So these could be useful in the right circumstances but aren’t general solutions for this problem. In case you’re dealing with sorted Python lists you might want to take a look at the bisect module instead of using Numpys searchsorted.

Question 7

I’d like to propose

np.min(np.append(np.where(aa>5)[0],np.inf))

This will return the smallest index where the condition is met, while returning infinity if the condition is never met (and where returns an empty array).

Question 8

I would go with

i = np.min(np.where(V >= x))

where V is vector (1d array), x is the value and i is the resulting index.

首次出现大于当前值的值

问题：首次出现大于当前值的值

回答 0

回答 1

回答 2

回答 3

回答 4

元素之间步长恒定的数组

使用numba的一般解决方案

基准测试

项目在开始

项目在末尾

项目在sqrt（len）

没有值满足条件时的功能比较

Arrays that have a constant step between elements

General solution using numba

Benchmark

item is at the beginning

item is at the end

item is at sqrt(len)

Comparison of the functions when no value satisfies the condition

回答 5

回答 6

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

str.startswith带有要测试的字符串列表

列出带有boto3的存储桶的内容

如何在python中使用networkx绘制有向图？

Python + Django页面重定向

Python-何时使用文件vs打开

为什么Python的无穷大散列具有π的数字？

首次出现大于当前值的值

问题：首次出现大于当前值的值

回答 0

回答 1

回答 2

回答 3

回答 4

元素之间步长恒定的数组

使用numba的一般解决方案

基准测试

项目在开始

项目在末尾

项目在sqrt（len）

没有值满足条件时的功能比较

Arrays that have a constant step between elements

General solution using numba

Benchmark

item is at the beginning

item is at the end

item is at sqrt(len)

Comparison of the functions when no value satisfies the condition

回答 5

回答 6

相关文章

排行榜展示

文章展示