标签归档:numpy-ndarray

如何使用python / numpy计算百分位数?

问题:如何使用python / numpy计算百分位数?

是否有一种方便的方法来计算序列或一维numpy数组的百分位数?

我正在寻找类似于Excel的百分位数功能的东西。

我查看了NumPy的统计资料参考,但找不到。我只能找到中位数(第50个百分位数),但没有更具体的内容。

Is there a convenient way to calculate percentiles for a sequence or single-dimensional numpy array?

I am looking for something similar to Excel’s percentile function.

I looked in NumPy’s statistics reference, and couldn’t find this. All I could find is the median (50th percentile), but not something more specific.


回答 0

您可能对SciPy Stats软件包感兴趣。它具有您需要的百分位数功能和许多其他统计功能。

percentile() numpy太。

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) # return 50th percentile, e.g median.
print p
3.0

这张票使我相信他们不会percentile()很快集成到numpy中。

You might be interested in the SciPy Stats package. It has the percentile function you’re after and many other statistical goodies.

percentile() is available in numpy too.

import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50) # return 50th percentile, e.g median.
print p
3.0

This ticket leads me to believe they won’t be integrating percentile() into numpy anytime soon.


回答 1

顺便说一句,有一个纯Python实现的百分位数功能,以防万一不想依赖scipy。该函数复制如下:

## {{{ http://code.activestate.com/recipes/511478/ (r1)
import math
import functools

def percentile(N, percent, key=lambda x:x):
    """
    Find the percentile of a list of values.

    @parameter N - is a list of values. Note N MUST BE already sorted.
    @parameter percent - a float value from 0.0 to 1.0.
    @parameter key - optional key function to compute value from each element of N.

    @return - the percentile of the values
    """
    if not N:
        return None
    k = (len(N)-1) * percent
    f = math.floor(k)
    c = math.ceil(k)
    if f == c:
        return key(N[int(k)])
    d0 = key(N[int(f)]) * (c-k)
    d1 = key(N[int(c)]) * (k-f)
    return d0+d1

# median is 50th percentile.
median = functools.partial(percentile, percent=0.5)
## end of http://code.activestate.com/recipes/511478/ }}}

By the way, there is a pure-Python implementation of percentile function, in case one doesn’t want to depend on scipy. The function is copied below:

## {{{ http://code.activestate.com/recipes/511478/ (r1)
import math
import functools

def percentile(N, percent, key=lambda x:x):
    """
    Find the percentile of a list of values.

    @parameter N - is a list of values. Note N MUST BE already sorted.
    @parameter percent - a float value from 0.0 to 1.0.
    @parameter key - optional key function to compute value from each element of N.

    @return - the percentile of the values
    """
    if not N:
        return None
    k = (len(N)-1) * percent
    f = math.floor(k)
    c = math.ceil(k)
    if f == c:
        return key(N[int(k)])
    d0 = key(N[int(f)]) * (c-k)
    d1 = key(N[int(c)]) * (k-f)
    return d0+d1

# median is 50th percentile.
median = functools.partial(percentile, percent=0.5)
## end of http://code.activestate.com/recipes/511478/ }}}

回答 2

import numpy as np
a = [154, 400, 1124, 82, 94, 108]
print np.percentile(a,95) # gives the 95th percentile
import numpy as np
a = [154, 400, 1124, 82, 94, 108]
print np.percentile(a,95) # gives the 95th percentile

回答 3

这是不使用numpy的方法,仅使用python计算百分位数。

import math

def percentile(data, percentile):
    size = len(data)
    return sorted(data)[int(math.ceil((size * percentile) / 100)) - 1]

p5 = percentile(mylist, 5)
p25 = percentile(mylist, 25)
p50 = percentile(mylist, 50)
p75 = percentile(mylist, 75)
p95 = percentile(mylist, 95)

Here’s how to do it without numpy, using only python to calculate the percentile.

import math

def percentile(data, percentile):
    size = len(data)
    return sorted(data)[int(math.ceil((size * percentile) / 100)) - 1]

p5 = percentile(mylist, 5)
p25 = percentile(mylist, 25)
p50 = percentile(mylist, 50)
p75 = percentile(mylist, 75)
p95 = percentile(mylist, 95)

回答 4

我通常看到的百分位数定义期望结果是所提供的列表中的值,在该列表下找到P值的百分比…这意味着结果必须来自集合,而不是集合元素之间的插值。为此,您可以使用更简单的功能。

def percentile(N, P):
    """
    Find the percentile of a list of values

    @parameter N - A list of values.  N must be sorted.
    @parameter P - A float value from 0.0 to 1.0

    @return - The percentile of the values.
    """
    n = int(round(P * len(N) + 0.5))
    return N[n-1]

# A = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# B = (15, 20, 35, 40, 50)
#
# print percentile(A, P=0.3)
# 4
# print percentile(A, P=0.8)
# 9
# print percentile(B, P=0.3)
# 20
# print percentile(B, P=0.8)
# 50

如果您希望从提供的列表中获取等于或低于P值百分比的值,请使用以下简单修改:

def percentile(N, P):
    n = int(round(P * len(N) + 0.5))
    if n > 1:
        return N[n-2]
    else:
        return N[0]

或使用@ijustlovemath建议的简化形式:

def percentile(N, P):
    n = max(int(round(P * len(N) + 0.5)), 2)
    return N[n-2]

The definition of percentile I usually see expects as a result the value from the supplied list below which P percent of values are found… which means the result must be from the set, not an interpolation between set elements. To get that, you can use a simpler function.

def percentile(N, P):
    """
    Find the percentile of a list of values

    @parameter N - A list of values.  N must be sorted.
    @parameter P - A float value from 0.0 to 1.0

    @return - The percentile of the values.
    """
    n = int(round(P * len(N) + 0.5))
    return N[n-1]

# A = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# B = (15, 20, 35, 40, 50)
#
# print percentile(A, P=0.3)
# 4
# print percentile(A, P=0.8)
# 9
# print percentile(B, P=0.3)
# 20
# print percentile(B, P=0.8)
# 50

If you would rather get the value from the supplied list at or below which P percent of values are found, then use this simple modification:

def percentile(N, P):
    n = int(round(P * len(N) + 0.5))
    if n > 1:
        return N[n-2]
    else:
        return N[0]

Or with the simplification suggested by @ijustlovemath:

def percentile(N, P):
    n = max(int(round(P * len(N) + 0.5)), 2)
    return N[n-2]

回答 5

从开始Python 3.8,标准库附带的quantiles功能作为statistics模块的一部分:

from statistics import quantiles

quantiles([1, 2, 3, 4, 5], n=100)
# [0.06, 0.12, 0.18, 0.24, 0.3, 0.36, 0.42, 0.48, 0.54, 0.6, 0.66, 0.72, 0.78, 0.84, 0.9, 0.96, 1.02, 1.08, 1.14, 1.2, 1.26, 1.32, 1.38, 1.44, 1.5, 1.56, 1.62, 1.68, 1.74, 1.8, 1.86, 1.92, 1.98, 2.04, 2.1, 2.16, 2.22, 2.28, 2.34, 2.4, 2.46, 2.52, 2.58, 2.64, 2.7, 2.76, 2.82, 2.88, 2.94, 3.0, 3.06, 3.12, 3.18, 3.24, 3.3, 3.36, 3.42, 3.48, 3.54, 3.6, 3.66, 3.72, 3.78, 3.84, 3.9, 3.96, 4.02, 4.08, 4.14, 4.2, 4.26, 4.32, 4.38, 4.44, 4.5, 4.56, 4.62, 4.68, 4.74, 4.8, 4.86, 4.92, 4.98, 5.04, 5.1, 5.16, 5.22, 5.28, 5.34, 5.4, 5.46, 5.52, 5.58, 5.64, 5.7, 5.76, 5.82, 5.88, 5.94]
quantiles([1, 2, 3, 4, 5], n=100)[49] # 50th percentile (e.g median)
# 3.0

quantiles对于给定的分布,返回切分点dist列表,这些n - 1切点将分n位数间隔(以等概率dist分为n连续间隔)划分为:

statistics.quantiles(dist,*,n = 4,method =’exclusive’)

在那里n,在我们的案例(percentiles)是100

Starting Python 3.8, the standard library comes with the quantiles function as part of the statistics module:

from statistics import quantiles

quantiles([1, 2, 3, 4, 5], n=100)
# [0.06, 0.12, 0.18, 0.24, 0.3, 0.36, 0.42, 0.48, 0.54, 0.6, 0.66, 0.72, 0.78, 0.84, 0.9, 0.96, 1.02, 1.08, 1.14, 1.2, 1.26, 1.32, 1.38, 1.44, 1.5, 1.56, 1.62, 1.68, 1.74, 1.8, 1.86, 1.92, 1.98, 2.04, 2.1, 2.16, 2.22, 2.28, 2.34, 2.4, 2.46, 2.52, 2.58, 2.64, 2.7, 2.76, 2.82, 2.88, 2.94, 3.0, 3.06, 3.12, 3.18, 3.24, 3.3, 3.36, 3.42, 3.48, 3.54, 3.6, 3.66, 3.72, 3.78, 3.84, 3.9, 3.96, 4.02, 4.08, 4.14, 4.2, 4.26, 4.32, 4.38, 4.44, 4.5, 4.56, 4.62, 4.68, 4.74, 4.8, 4.86, 4.92, 4.98, 5.04, 5.1, 5.16, 5.22, 5.28, 5.34, 5.4, 5.46, 5.52, 5.58, 5.64, 5.7, 5.76, 5.82, 5.88, 5.94]
quantiles([1, 2, 3, 4, 5], n=100)[49] # 50th percentile (e.g median)
# 3.0

quantiles returns for a given distribution dist a list of n - 1 cut points separating the n quantile intervals (division of dist into n continuous intervals with equal probability):

statistics.quantiles(dist, *, n=4, method=’exclusive’)

where n, in our case (percentiles) is 100.


回答 6

检查scipy.stats模块:

 scipy.stats.scoreatpercentile

check for scipy.stats module:

 scipy.stats.scoreatpercentile

回答 7

要计算系列的百分位数,请运行:

from scipy.stats import rankdata
import numpy as np

def calc_percentile(a, method='min'):
    if isinstance(a, list):
        a = np.asarray(a)
    return rankdata(a, method=method) / float(len(a))

例如:

a = range(20)
print {val: round(percentile, 3) for val, percentile in zip(a, calc_percentile(a))}
>>> {0: 0.05, 1: 0.1, 2: 0.15, 3: 0.2, 4: 0.25, 5: 0.3, 6: 0.35, 7: 0.4, 8: 0.45, 9: 0.5, 10: 0.55, 11: 0.6, 12: 0.65, 13: 0.7, 14: 0.75, 15: 0.8, 16: 0.85, 17: 0.9, 18: 0.95, 19: 1.0}

To calculate the percentile of a series, run:

from scipy.stats import rankdata
import numpy as np

def calc_percentile(a, method='min'):
    if isinstance(a, list):
        a = np.asarray(a)
    return rankdata(a, method=method) / float(len(a))

For example:

a = range(20)
print {val: round(percentile, 3) for val, percentile in zip(a, calc_percentile(a))}
>>> {0: 0.05, 1: 0.1, 2: 0.15, 3: 0.2, 4: 0.25, 5: 0.3, 6: 0.35, 7: 0.4, 8: 0.45, 9: 0.5, 10: 0.55, 11: 0.6, 12: 0.65, 13: 0.7, 14: 0.75, 15: 0.8, 16: 0.85, 17: 0.9, 18: 0.95, 19: 1.0}

回答 8

如果您需要答案成为输入numpy数组的成员,请执行以下操作:

只是要补充一点,默认情况下numpy中的percentile函数将输出计算为输入向量中两个相邻条目的线性加权平均值。在某些情况下,人们可能希望返回的百分位数是向量的实际元素,在这种情况下,从v1.9.0开始,您可以使用“插值”选项,并使用“较低”,“较高”或“最近”。

import numpy as np
x=np.random.uniform(10,size=(1000))-5.0

np.percentile(x,70) # 70th percentile

2.075966046220879

np.percentile(x,70,interpolation="nearest")

2.0729677997904314

后者是向量中的实际项,而前者是两个向量项的线性插值,它们与百分位数接壤

In case you need the answer to be a member of the input numpy array:

Just to add that the percentile function in numpy by default calculates the output as a linear weighted average of the two neighboring entries in the input vector. In some cases people may want the returned percentile to be an actual element of the vector, in this case, from v1.9.0 onwards you can use the “interpolation” option, with either “lower”, “higher” or “nearest”.

import numpy as np
x=np.random.uniform(10,size=(1000))-5.0

np.percentile(x,70) # 70th percentile

2.075966046220879

np.percentile(x,70,interpolation="nearest")

2.0729677997904314

The latter is an actual entry in the vector, while the former is a linear interpolation of two vector entries that border the percentile


回答 9

适用于一系列:用于描述功能

假设您的df包含以下列sales和id。您想要计算销售百分比,则它的工作原理如下:

df['sales'].describe(percentiles = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1])

0.0: .0: minimum
1: maximum 
0.1 : 10th percentile and so on

for a series: used describe functions

suppose you have df with following columns sales and id. you want to calculate percentiles for sales then it works like this,

df['sales'].describe(percentiles = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1])

0.0: .0: minimum
1: maximum 
0.1 : 10th percentile and so on

回答 10

计算一维numpy序列或矩阵的百分位数的便捷方法是使用numpy.percentile < https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html >。例:

import numpy as np

a = np.array([0,1,2,3,4,5,6,7,8,9,10])
p50 = np.percentile(a, 50) # return 50th percentile, e.g median.
p90 = np.percentile(a, 90) # return 90th percentile.
print('median = ',p50,' and p90 = ',p90) # median =  5.0  and p90 =  9.0

但是,如果您的数据中有任何NaN值,则上述功能将无用。在这种情况下,建议使用的函数是numpy.nanpercentile < https://docs.scipy.org/doc/numpy/reference/generation/numpy.nanpercentile.html >函数:

import numpy as np

a_NaN = np.array([0.,1.,2.,3.,4.,5.,6.,7.,8.,9.,10.])
a_NaN[0] = np.nan
print('a_NaN',a_NaN)
p50 = np.nanpercentile(a_NaN, 50) # return 50th percentile, e.g median.
p90 = np.nanpercentile(a_NaN, 90) # return 90th percentile.
print('median = ',p50,' and p90 = ',p90) # median =  5.5  and p90 =  9.1

在上面显示的两个选项中,您仍然可以选择插值模式。请按照以下示例进行操作,以便于理解。

import numpy as np

b = np.array([1,2,3,4,5,6,7,8,9,10])
print('percentiles using default interpolation')
p10 = np.percentile(b, 10) # return 10th percentile.
p50 = np.percentile(b, 50) # return 50th percentile, e.g median.
p90 = np.percentile(b, 90) # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.9 , median =  5.5  and p90 =  9.1

print('percentiles using interpolation = ', "linear")
p10 = np.percentile(b, 10,interpolation='linear') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='linear') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='linear') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.9 , median =  5.5  and p90 =  9.1

print('percentiles using interpolation = ', "lower")
p10 = np.percentile(b, 10,interpolation='lower') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='lower') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='lower') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1 , median =  5  and p90 =  9

print('percentiles using interpolation = ', "higher")
p10 = np.percentile(b, 10,interpolation='higher') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='higher') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='higher') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  2 , median =  6  and p90 =  10

print('percentiles using interpolation = ', "midpoint")
p10 = np.percentile(b, 10,interpolation='midpoint') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='midpoint') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='midpoint') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.5 , median =  5.5  and p90 =  9.5

print('percentiles using interpolation = ', "nearest")
p10 = np.percentile(b, 10,interpolation='nearest') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='nearest') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='nearest') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  2 , median =  5  and p90 =  9

如果您的输入数组仅包含整数值,那么您可能会对百分位数答案作为整数感兴趣。如果是这样,请选择插值模式,例如“较低”,“较高”或“最近”。

A convenient way to calculate percentiles for a one-dimensional numpy sequence or matrix is by using numpy.percentile <https://docs.scipy.org/doc/numpy/reference/generated/numpy.percentile.html>. Example:

import numpy as np

a = np.array([0,1,2,3,4,5,6,7,8,9,10])
p50 = np.percentile(a, 50) # return 50th percentile, e.g median.
p90 = np.percentile(a, 90) # return 90th percentile.
print('median = ',p50,' and p90 = ',p90) # median =  5.0  and p90 =  9.0

However, if there is any NaN value in your data, the above function will not be useful. The recommended function to use in that case is the numpy.nanpercentile <https://docs.scipy.org/doc/numpy/reference/generated/numpy.nanpercentile.html> function:

import numpy as np

a_NaN = np.array([0.,1.,2.,3.,4.,5.,6.,7.,8.,9.,10.])
a_NaN[0] = np.nan
print('a_NaN',a_NaN)
p50 = np.nanpercentile(a_NaN, 50) # return 50th percentile, e.g median.
p90 = np.nanpercentile(a_NaN, 90) # return 90th percentile.
print('median = ',p50,' and p90 = ',p90) # median =  5.5  and p90 =  9.1

In the two options presented above, you can still choose the interpolation mode. Follow the examples below for easier understanding.

import numpy as np

b = np.array([1,2,3,4,5,6,7,8,9,10])
print('percentiles using default interpolation')
p10 = np.percentile(b, 10) # return 10th percentile.
p50 = np.percentile(b, 50) # return 50th percentile, e.g median.
p90 = np.percentile(b, 90) # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.9 , median =  5.5  and p90 =  9.1

print('percentiles using interpolation = ', "linear")
p10 = np.percentile(b, 10,interpolation='linear') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='linear') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='linear') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.9 , median =  5.5  and p90 =  9.1

print('percentiles using interpolation = ', "lower")
p10 = np.percentile(b, 10,interpolation='lower') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='lower') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='lower') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1 , median =  5  and p90 =  9

print('percentiles using interpolation = ', "higher")
p10 = np.percentile(b, 10,interpolation='higher') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='higher') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='higher') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  2 , median =  6  and p90 =  10

print('percentiles using interpolation = ', "midpoint")
p10 = np.percentile(b, 10,interpolation='midpoint') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='midpoint') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='midpoint') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  1.5 , median =  5.5  and p90 =  9.5

print('percentiles using interpolation = ', "nearest")
p10 = np.percentile(b, 10,interpolation='nearest') # return 10th percentile.
p50 = np.percentile(b, 50,interpolation='nearest') # return 50th percentile, e.g median.
p90 = np.percentile(b, 90,interpolation='nearest') # return 90th percentile.
print('p10 = ',p10,', median = ',p50,' and p90 = ',p90)
#p10 =  2 , median =  5  and p90 =  9

If your input array only consists of integer values, you might be interested in the percentil answer as an integer. If so, choose interpolation mode such as ‘lower’, ‘higher’, or ‘nearest’.


如何创建全为真或全为假的numpy数组?

问题:如何创建全为真或全为假的numpy数组?

在Python中,如何创建由全True或全False填充的任意形状的numpy数组?

In Python, how do I create a numpy array of arbitrary shape filled with all True or all False?


回答 0

numpy已经可以非常容易地创建全1或全0的数组:

例如numpy.ones((2, 2))numpy.zeros((2, 2))

由于TrueFalsePython中被表示为10,分别,我们只有指定这个数组应该是布尔使用可选dtype参数,我们正在这样做。

numpy.ones((2, 2), dtype=bool)

返回:

array([[ True,  True],
       [ True,  True]], dtype=bool)

更新:2013年10月30日

从numpy 版本1.8开始,我们可以使用full语法更清楚地表明我们意图的语法来达到相同的结果(如fmonegaglia指出):

numpy.full((2, 2), True, dtype=bool)

更新:2017年1月16日

因为至少numpy的1.12版full自动转换结果到了dtype第二个参数,所以我们可以这样写:

numpy.full((2, 2), True)

numpy already allows the creation of arrays of all ones or all zeros very easily:

e.g. numpy.ones((2, 2)) or numpy.zeros((2, 2))

Since True and False are represented in Python as 1 and 0, respectively, we have only to specify this array should be boolean using the optional dtype parameter and we are done.

numpy.ones((2, 2), dtype=bool)

returns:

array([[ True,  True],
       [ True,  True]], dtype=bool)

UPDATE: 30 October 2013

Since numpy version 1.8, we can use full to achieve the same result with syntax that more clearly shows our intent (as fmonegaglia points out):

numpy.full((2, 2), True, dtype=bool)

UPDATE: 16 January 2017

Since at least numpy version 1.12, full automatically casts results to the dtype of the second parameter, so we can just write:

numpy.full((2, 2), True)


回答 1

numpy.full((2,2), True, dtype=bool)
numpy.full((2,2), True, dtype=bool)

回答 2

ones和和zeros分别创建一个全为1和0的数组,它们带有一个可选dtype参数:

>>> numpy.ones((2, 2), dtype=bool)
array([[ True,  True],
       [ True,  True]], dtype=bool)
>>> numpy.zeros((2, 2), dtype=bool)
array([[False, False],
       [False, False]], dtype=bool)

ones and zeros, which create arrays full of ones and zeros respectively, take an optional dtype parameter:

>>> numpy.ones((2, 2), dtype=bool)
array([[ True,  True],
       [ True,  True]], dtype=bool)
>>> numpy.zeros((2, 2), dtype=bool)
array([[False, False],
       [False, False]], dtype=bool)

回答 3

如果它不是可写的,则可以使用以下方法创建这样的数组np.broadcast_to

>>> import numpy as np
>>> np.broadcast_to(True, (2, 5))
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

如果需要可写,也可以fill自己创建一个空数组:

>>> arr = np.empty((2, 5), dtype=bool)
>>> arr.fill(1)
>>> arr
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

这些方法只是替代建议。通常,您应该坚持np.fullnp.zeros或者np.ones像其他答案所建议的那样。

If it doesn’t have to be writeable you can create such an array with np.broadcast_to:

>>> import numpy as np
>>> np.broadcast_to(True, (2, 5))
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

If you need it writable you can also create an empty array and fill it yourself:

>>> arr = np.empty((2, 5), dtype=bool)
>>> arr.fill(1)
>>> arr
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

These approaches are only alternative suggestions. In general you should stick with np.full, np.zeros or np.ones like the other answers suggest.


回答 4

快速运行一个timeit,以查看np.full和之间是否有任何差异np.ones版本。

答:不可以

import timeit

n_array, n_test = 1000, 10000
setup = f"import numpy as np; n = {n_array};"

print(f"np.ones: {timeit.timeit('np.ones((n, n), dtype=bool)', number=n_test, setup=setup)}s")
print(f"np.full: {timeit.timeit('np.full((n, n), True)', number=n_test, setup=setup)}s")

结果:

np.ones: 0.38416870904620737s
np.full: 0.38430388597771525s


重要

关于帖子np.empty(由于声誉太低,我无法评论):

不要那样做。请勿使用np.empty初始化全True数组

由于数组为空,因此不会写入内存,也无法保证您的值是多少,例如

>>> print(np.empty((4,4), dtype=bool))
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True False False]]

Quickly ran a timeit to see, if there are any differences between the np.full and np.ones version.

Answer: No

import timeit

n_array, n_test = 1000, 10000
setup = f"import numpy as np; n = {n_array};"

print(f"np.ones: {timeit.timeit('np.ones((n, n), dtype=bool)', number=n_test, setup=setup)}s")
print(f"np.full: {timeit.timeit('np.full((n, n), True)', number=n_test, setup=setup)}s")

Result:

np.ones: 0.38416870904620737s
np.full: 0.38430388597771525s


IMPORTANT

Regarding the post about np.empty (and I cannot comment, as my reputation is too low):

DON’T DO THAT. DON’T USE np.empty to initialize an all-True array

As the array is empty, the memory is not written and there is no guarantee, what your values will be, e.g.

>>> print(np.empty((4,4), dtype=bool))
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True False False]]

回答 5

>>> a = numpy.full((2,4), True, dtype=bool)
>>> a[1][3]
True
>>> a
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

numpy.full(大小,标量值,类型)。也可以传递其他参数,有关文档,请检查https://docs.scipy.org/doc/numpy/reference/produced/numpy.full.html

>>> a = numpy.full((2,4), True, dtype=bool)
>>> a[1][3]
True
>>> a
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

numpy.full(Size, Scalar Value, Type). There is other arguments as well that can be passed, for documentation on that, check https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html


numpy.newaxis如何工作以及何时使用?

问题:numpy.newaxis如何工作以及何时使用?

当我尝试

numpy.newaxis

结果为我提供了一个x轴从0到1 numpy.newaxis的二维绘图框。但是,当我尝试使用对向量进行切片时,

vector[0:4,]
[ 0.04965172  0.04979645  0.04994022  0.05008303]
vector[:, np.newaxis][0:4,]
[[ 0.04965172]
[ 0.04979645]
[ 0.04994022]
[ 0.05008303]]

除了将行向量更改为列向量之外,是否一样?

通常,的用途是什么numpy.newaxis,我们应该在什么情况下使用它?

When I try

numpy.newaxis

the result gives me a 2-d plot frame with x-axis from 0 to 1. However, when I try using numpy.newaxis to slice a vector,

vector[0:4,]
[ 0.04965172  0.04979645  0.04994022  0.05008303]
vector[:, np.newaxis][0:4,]
[[ 0.04965172]
[ 0.04979645]
[ 0.04994022]
[ 0.05008303]]

Is it the same thing except that it changes a row vector to a column vector?

Generally, what is the use of numpy.newaxis, and in which circumstances should we use it?


回答 0

简而言之,当使用一次时,用于将现有数组的尺寸numpy.newaxis增加一维。从而,

  • 一维阵列将变为二维阵列

  • 2D阵列将变为3D阵列

  • 3D阵列将变成4D阵列

  • 4D阵列将变为5D阵列

等等..

这里是一个视觉说明描绘促进 1D阵列以二维阵列。

newaxis canva可视化


方案1:如上图所示,np.newaxis当您想将一维数组显式转换为行向量列向量时,可能会派上用场。

例:

# 1D array
In [7]: arr = np.arange(4)
In [8]: arr.shape
Out[8]: (4,)

# make it as row vector by inserting an axis along first dimension
In [9]: row_vec = arr[np.newaxis, :]     # arr[None, :]
In [10]: row_vec.shape
Out[10]: (1, 4)

# make it as column vector by inserting an axis along second dimension
In [11]: col_vec = arr[:, np.newaxis]     # arr[:, None]
In [12]: col_vec.shape
Out[12]: (4, 1)

场景2:当我们想将numpy广播用作某些操作的一部分时,例如在添加一些数组时。

例:

假设您要添加以下两个数组:

 x1 = np.array([1, 2, 3, 4, 5])
 x2 = np.array([5, 4, 3])

如果您尝试像这样添加它们,NumPy将引发以下内容ValueError

ValueError: operands could not be broadcast together with shapes (5,) (3,)

在这种情况下,您可以np.newaxis用来增加数组之一的尺寸,以便NumPy可以广播

In [2]: x1_new = x1[:, np.newaxis]    # x1[:, None]
# now, the shape of x1_new is (5, 1)
# array([[1],
#        [2],
#        [3],
#        [4],
#        [5]])

现在,添加:

In [3]: x1_new + x2
Out[3]:
array([[ 6,  5,  4],
       [ 7,  6,  5],
       [ 8,  7,  6],
       [ 9,  8,  7],
       [10,  9,  8]])

另外,您也可以向数组添加新轴x2

In [6]: x2_new = x2[:, np.newaxis]    # x2[:, None]
In [7]: x2_new     # shape is (3, 1)
Out[7]: 
array([[5],
       [4],
       [3]])

现在,添加:

In [8]: x1 + x2_new
Out[8]: 
array([[ 6,  7,  8,  9, 10],
       [ 5,  6,  7,  8,  9],
       [ 4,  5,  6,  7,  8]])

注意请注意,在两种情况下我们都得到相同的结果(但一种是另一种的转置)。


方案3:这类似于方案1。但是,你可以使用np.newaxis不止一次地促进阵列更高的层面。有时对于高阶数组(即Tensors)需要这样的操作。

例:

In [124]: arr = np.arange(5*5).reshape(5,5)

In [125]: arr.shape
Out[125]: (5, 5)

# promoting 2D array to a 5D array
In [126]: arr_5D = arr[np.newaxis, ..., np.newaxis, np.newaxis]    # arr[None, ..., None, None]

In [127]: arr_5D.shape
Out[127]: (1, 5, 5, 1, 1)

关于np.newaxisnp.reshape的更多背景

newaxis 也称为伪索引,它允许将轴临时添加到多数组中。

np.newaxis使用切片运算符来重新创建阵列而np.reshape重塑阵列所需的布局(假设尺寸匹配;这是必须reshape发生)。

In [13]: A = np.ones((3,4,5,6))
In [14]: B = np.ones((4,6))
In [15]: (A + B[:, np.newaxis, :]).shape     # B[:, None, :]
Out[15]: (3, 4, 5, 6)

在上面的示例中,我们在B(使用广播)的第一和第二轴之间插入了一个临时轴。此处使用缺失轴来填充,np.newaxis以使广播操作正常进行。


一般提示:您也可以None代替使用np.newaxis;这些实际上是相同的对象

In [13]: np.newaxis is None
Out[13]: True

PS也看到了一个很好的答案:newaxis vs reshape添加尺寸

Simply put, numpy.newaxis is used to increase the dimension of the existing array by one more dimension, when used once. Thus,

  • 1D array will become 2D array

  • 2D array will become 3D array

  • 3D array will become 4D array

  • 4D array will become 5D array

and so on..

Here is a visual illustration which depicts promotion of 1D array to 2D arrays.

newaxis canva visualization


Scenario-1: np.newaxis might come in handy when you want to explicitly convert a 1D array to either a row vector or a column vector, as depicted in the above picture.

Example:

# 1D array
In [7]: arr = np.arange(4)
In [8]: arr.shape
Out[8]: (4,)

# make it as row vector by inserting an axis along first dimension
In [9]: row_vec = arr[np.newaxis, :]     # arr[None, :]
In [10]: row_vec.shape
Out[10]: (1, 4)

# make it as column vector by inserting an axis along second dimension
In [11]: col_vec = arr[:, np.newaxis]     # arr[:, None]
In [12]: col_vec.shape
Out[12]: (4, 1)

Scenario-2: When we want to make use of numpy broadcasting as part of some operation, for instance while doing addition of some arrays.

Example:

Let’s say you want to add the following two arrays:

 x1 = np.array([1, 2, 3, 4, 5])
 x2 = np.array([5, 4, 3])

If you try to add these just like that, NumPy will raise the following ValueError :

ValueError: operands could not be broadcast together with shapes (5,) (3,)

In this situation, you can use np.newaxis to increase the dimension of one of the arrays so that NumPy can broadcast.

In [2]: x1_new = x1[:, np.newaxis]    # x1[:, None]
# now, the shape of x1_new is (5, 1)
# array([[1],
#        [2],
#        [3],
#        [4],
#        [5]])

Now, add:

In [3]: x1_new + x2
Out[3]:
array([[ 6,  5,  4],
       [ 7,  6,  5],
       [ 8,  7,  6],
       [ 9,  8,  7],
       [10,  9,  8]])

Alternatively, you can also add new axis to the array x2:

In [6]: x2_new = x2[:, np.newaxis]    # x2[:, None]
In [7]: x2_new     # shape is (3, 1)
Out[7]: 
array([[5],
       [4],
       [3]])

Now, add:

In [8]: x1 + x2_new
Out[8]: 
array([[ 6,  7,  8,  9, 10],
       [ 5,  6,  7,  8,  9],
       [ 4,  5,  6,  7,  8]])

Note: Observe that we get the same result in both cases (but one being the transpose of the other).


Scenario-3: This is similar to scenario-1. But, you can use np.newaxis more than once to promote the array to higher dimensions. Such an operation is sometimes needed for higher order arrays (i.e. Tensors).

Example:

In [124]: arr = np.arange(5*5).reshape(5,5)

In [125]: arr.shape
Out[125]: (5, 5)

# promoting 2D array to a 5D array
In [126]: arr_5D = arr[np.newaxis, ..., np.newaxis, np.newaxis]    # arr[None, ..., None, None]

In [127]: arr_5D.shape
Out[127]: (1, 5, 5, 1, 1)

As an alternative, you can use numpy.expand_dims that has an intuitive axis kwarg.

# adding new axes at 1st, 4th, and last dimension of the resulting array
In [131]: newaxes = (0, 3, -1)
In [132]: arr_5D = np.expand_dims(arr, axis=newaxes)
In [133]: arr_5D.shape
Out[133]: (1, 5, 5, 1, 1)

More background on np.newaxis vs np.reshape

newaxis is also called as a pseudo-index that allows the temporary addition of an axis into a multiarray.

np.newaxis uses the slicing operator to recreate the array while numpy.reshape reshapes the array to the desired layout (assuming that the dimensions match; And this is must for a reshape to happen).

Example

In [13]: A = np.ones((3,4,5,6))
In [14]: B = np.ones((4,6))
In [15]: (A + B[:, np.newaxis, :]).shape     # B[:, None, :]
Out[15]: (3, 4, 5, 6)

In the above example, we inserted a temporary axis between the first and second axes of B (to use broadcasting). A missing axis is filled-in here using np.newaxis to make the broadcasting operation work.


General Tip: You can also use None in place of np.newaxis; These are in fact the same objects.

In [13]: np.newaxis is None
Out[13]: True

P.S. Also see this great answer: newaxis vs reshape to add dimensions


回答 1

什么np.newaxis

np.newaxis仅仅是Python的常量的别名None,这意味着无论你使用np.newaxis,你也可以使用None

>>> np.newaxis is None
True

如果您阅读使用而不是的代码,则更具描述np.newaxisNone

如何使用np.newaxis

np.newaxis,通常使用与切片。它表示您要向数组添加其他维度。的位置np.newaxis代表我要添加尺寸的位置。

>>> import numpy as np
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a.shape
(10,)

在第一个示例中,我使用第一个维度中的所有元素并添加第二个维度:

>>> a[:, np.newaxis]
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])
>>> a[:, np.newaxis].shape
(10, 1)

第二个示例将一个维添加为第一维,然后将原始数组的第一维中的所有元素用作结果数组的第二维中的元素:

>>> a[np.newaxis, :]  # The output has 2 [] pairs!
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> a[np.newaxis, :].shape
(1, 10)

同样,您可以使用多个np.newaxis添加多个尺寸:

>>> a[np.newaxis, :, np.newaxis]  # note the 3 [] pairs in the output
array([[[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]])
>>> a[np.newaxis, :, np.newaxis].shape
(1, 10, 1)

有替代品np.newaxis吗?

NumPy:还有另一种非常相似的功能,np.expand_dims也可以用于插入一个尺寸:

>>> np.expand_dims(a, 1)  # like a[:, np.newaxis]
>>> np.expand_dims(a, 0)  # like a[np.newaxis, :]

但是考虑到它只是在中插入1s,shape您也可以reshape在数组中添加以下尺寸:

>>> a.reshape(a.shape + (1,))  # like a[:, np.newaxis]
>>> a.reshape((1,) + a.shape)  # like a[np.newaxis, :]

大多数情况下np.newaxis,添加尺寸是最简单的方法,但是最好知道替代方法。

什么时候使用np.newaxis

在某些情况下,添加尺寸很有用:

  • 数据是否应具有指定的维数。例如,如果要matplotlib.pyplot.imshow用于显示一维数组。

  • 如果您想让NumPy广播数组。通过添加维度,您可以例如获取一个数组的所有元素之间的差:a - a[:, np.newaxis]。之所以可行,是因为NumPy操作从最后一个维度1开始广播。

  • 添加必要的尺寸,以便NumPy 可以广播数组。这是可行的,因为每个length-1维仅被广播到另一个数组的对应1维的长度。


1如果您想了解有关广播规则的更多信息,关于该主题NumPy文档非常好。它还包括一个示例np.newaxis

>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> a[:, np.newaxis] + b
array([[  1.,   2.,   3.],
       [ 11.,  12.,  13.],
       [ 21.,  22.,  23.],
       [ 31.,  32.,  33.]])

What is np.newaxis?

The np.newaxis is just an alias for the Python constant None, which means that wherever you use np.newaxis you could also use None:

>>> np.newaxis is None
True

It’s just more descriptive if you read code that uses np.newaxis instead of None.

How to use np.newaxis?

The np.newaxis is generally used with slicing. It indicates that you want to add an additional dimension to the array. The position of the np.newaxis represents where I want to add dimensions.

>>> import numpy as np
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a.shape
(10,)

In the first example I use all elements from the first dimension and add a second dimension:

>>> a[:, np.newaxis]
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])
>>> a[:, np.newaxis].shape
(10, 1)

The second example adds a dimension as first dimension and then uses all elements from the first dimension of the original array as elements in the second dimension of the result array:

>>> a[np.newaxis, :]  # The output has 2 [] pairs!
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> a[np.newaxis, :].shape
(1, 10)

Similarly you can use multiple np.newaxis to add multiple dimensions:

>>> a[np.newaxis, :, np.newaxis]  # note the 3 [] pairs in the output
array([[[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]])
>>> a[np.newaxis, :, np.newaxis].shape
(1, 10, 1)

Are there alternatives to np.newaxis?

There is another very similar functionality in NumPy: np.expand_dims, which can also be used to insert one dimension:

>>> np.expand_dims(a, 1)  # like a[:, np.newaxis]
>>> np.expand_dims(a, 0)  # like a[np.newaxis, :]

But given that it just inserts 1s in the shape you could also reshape the array to add these dimensions:

>>> a.reshape(a.shape + (1,))  # like a[:, np.newaxis]
>>> a.reshape((1,) + a.shape)  # like a[np.newaxis, :]

Most of the times np.newaxis is the easiest way to add dimensions, but it’s good to know the alternatives.

When to use np.newaxis?

In several contexts is adding dimensions useful:

  • If the data should have a specified number of dimensions. For example if you want to use matplotlib.pyplot.imshow to display a 1D array.

  • If you want NumPy to broadcast arrays. By adding a dimension you could for example get the difference between all elements of one array: a - a[:, np.newaxis]. This works because NumPy operations broadcast starting with the last dimension 1.

  • To add a necessary dimension so that NumPy can broadcast arrays. This works because each length-1 dimension is simply broadcast to the length of the corresponding1 dimension of the other array.


1 If you want to read more about the broadcasting rules the NumPy documentation on that subject is very good. It also includes an example with np.newaxis:

>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> a[:, np.newaxis] + b
array([[  1.,   2.,   3.],
       [ 11.,  12.,  13.],
       [ 21.,  22.,  23.],
       [ 31.,  32.,  33.]])

回答 2

您从一维数字列表开始。使用完后numpy.newaxis,您将其转换为二维矩阵,每个矩阵由四行组成。

然后,您可以使用该矩阵进行矩阵乘法,或者将其用于构建更大的4 xn矩阵。

You started with a one-dimensional list of numbers. Once you used numpy.newaxis, you turned it into a two-dimensional matrix, consisting of four rows of one column each.

You could then use that matrix for matrix multiplication, or involve it in the construction of a larger 4 x n matrix.


回答 3

newaxis选择元组中的object对象用于将结果选择的尺寸扩展一个单位长度尺寸。

这不仅仅是行矩阵到列矩阵的转换。

考虑下面的示例:

In [1]:x1 = np.arange(1,10).reshape(3,3)
       print(x1)
Out[1]: array([[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]])

现在让我们为数据添加新维度,

In [2]:x1_new = x1[:,np.newaxis]
       print(x1_new)
Out[2]:array([[[1, 2, 3]],

              [[4, 5, 6]],

              [[7, 8, 9]]])

您可以newaxis在此处看到添加了额外的维度,x1的维度为(3,3),X1_new的维度为(3,1,3)。

我们的新维度如何使我们能够进行不同的操作:

In [3]:x2 = np.arange(11,20).reshape(3,3)
       print(x2)
Out[3]:array([[11, 12, 13],
              [14, 15, 16],
              [17, 18, 19]]) 

将x1_new和x2相加,我们得到:

In [4]:x1_new+x2
Out[4]:array([[[12, 14, 16],
               [15, 17, 19],
               [18, 20, 22]],

              [[15, 17, 19],
               [18, 20, 22],
               [21, 23, 25]],

              [[18, 20, 22],
               [21, 23, 25],
               [24, 26, 28]]])

因此,newaxis不仅仅是行到列矩阵的转换。它增加了矩阵的维数,从而使我们能够对其进行更多操作。

newaxis object in the selection tuple serves to expand the dimensions of the resulting selection by one unit-length dimension.

It is not just conversion of row matrix to column matrix.

Consider the example below:

In [1]:x1 = np.arange(1,10).reshape(3,3)
       print(x1)
Out[1]: array([[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]])

Now lets add new dimension to our data,

In [2]:x1_new = x1[:,np.newaxis]
       print(x1_new)
Out[2]:array([[[1, 2, 3]],

              [[4, 5, 6]],

              [[7, 8, 9]]])

You can see that newaxis added the extra dimension here, x1 had dimension (3,3) and X1_new has dimension (3,1,3).

How our new dimension enables us to different operations:

In [3]:x2 = np.arange(11,20).reshape(3,3)
       print(x2)
Out[3]:array([[11, 12, 13],
              [14, 15, 16],
              [17, 18, 19]]) 

Adding x1_new and x2, we get:

In [4]:x1_new+x2
Out[4]:array([[[12, 14, 16],
               [15, 17, 19],
               [18, 20, 22]],

              [[15, 17, 19],
               [18, 20, 22],
               [21, 23, 25]],

              [[18, 20, 22],
               [21, 23, 25],
               [24, 26, 28]]])

Thus, newaxis is not just conversion of row to column matrix. It increases the dimension of matrix, thus enabling us to do more operations on it.


如何将PIL图像转换为numpy数组?

问题:如何将PIL图像转换为numpy数组?

好吧,我想将PIL图像对象来回转换为numpy数组,因此我可以比PIL PixelAccess对象所允许的更快地进行逐像素转换。我已经找到了如何通过以下方式将像素信息放置在有用的3D numpy数组中:

pic = Image.open("foo.jpg")
pix = numpy.array(pic.getdata()).reshape(pic.size[0], pic.size[1], 3)

但是,在完成所有出色的转换之后,我似乎无法弄清楚如何将其重新加载到PIL对象中。我知道该putdata()方法,但似乎无法使其正常工作。

Alright, I’m toying around with converting a PIL image object back and forth to a numpy array so I can do some faster pixel by pixel transformations than PIL’s PixelAccess object would allow. I’ve figured out how to place the pixel information in a useful 3D numpy array by way of:

pic = Image.open("foo.jpg")
pix = numpy.array(pic.getdata()).reshape(pic.size[0], pic.size[1], 3)

But I can’t seem to figure out how to load it back into the PIL object after I’ve done all my awesome transforms. I’m aware of the putdata() method, but can’t quite seem to get it to behave.


回答 0

您并不是在说putdata()行为方式到底有多精确。我假设你在做

>>> pic.putdata(a)
Traceback (most recent call last):
  File "...blablabla.../PIL/Image.py", line 1185, in putdata
    self.im.putdata(data, scale, offset)
SystemError: new style getargs format but argument is not a tuple

这是因为putdata需要一个元组序列,并且您要给它一个numpy数组。这个

>>> data = list(tuple(pixel) for pixel in pix)
>>> pic.putdata(data)

可以工作,但是非常慢。

从PIL 1.1.6开始,在图像和numpy数组之间进行转换“正确”方法很简单

>>> pix = numpy.array(pic)

尽管结果数组的格式与您的格式不同(在这种情况下为3维数组或行/列/ rgb)。

然后,在对阵列进行更改之后,您应该可以执行任一操作pic.putdata(pix)或使用创建新图像Image.fromarray(pix)

You’re not saying how exactly putdata() is not behaving. I’m assuming you’re doing

>>> pic.putdata(a)
Traceback (most recent call last):
  File "...blablabla.../PIL/Image.py", line 1185, in putdata
    self.im.putdata(data, scale, offset)
SystemError: new style getargs format but argument is not a tuple

This is because putdata expects a sequence of tuples and you’re giving it a numpy array. This

>>> data = list(tuple(pixel) for pixel in pix)
>>> pic.putdata(data)

will work but it is very slow.

As of PIL 1.1.6, the “proper” way to convert between images and numpy arrays is simply

>>> pix = numpy.array(pic)

although the resulting array is in a different format than yours (3-d array or rows/columns/rgb in this case).

Then, after you make your changes to the array, you should be able to do either pic.putdata(pix) or create a new image with Image.fromarray(pix).


回答 1

I以数组形式打开:

>>> I = numpy.asarray(PIL.Image.open('test.jpg'))

对进行一些处理I,然后将其转换回图像:

>>> im = PIL.Image.fromarray(numpy.uint8(I))

使用FFT,Python过滤numpy图像

如果出于某种原因要明确地执行此操作,则此页面上的correlation.zip中有使用getdata()的pil2array()和array2pil()函数。

Open I as an array:

>>> I = numpy.asarray(PIL.Image.open('test.jpg'))

Do some stuff to I, then, convert it back to an image:

>>> im = PIL.Image.fromarray(numpy.uint8(I))

Filter numpy images with FFT, Python

If you want to do it explicitly for some reason, there are pil2array() and array2pil() functions using getdata() on this page in correlation.zip.


回答 2

我在Python 3.5中使用Pillow 4.1.1(PIL的后继产品)。枕头和numpy之间的转换非常简单。

from PIL import Image
import numpy as np
im = Image.open('1.jpg')
im2arr = np.array(im) # im2arr.shape: height x width x channel
arr2im = Image.fromarray(im2arr)

需要注意的一件事是,枕头样式im是专栏为主的,而numpy 样式是专栏的im2arr。但是,该功能Image.fromarray已经考虑了这一点。即,arr2im.size == im.sizearr2im.mode == im.mode在上面的例子。

在处理转换后的numpy数组时,例如在进行转换im2arr = np.rollaxis(im2arr, 2, 0)im2arr = np.transpose(im2arr, (2, 0, 1))转换为CxHxW格式时,我们应注意HxWxC数据格式。

I am using Pillow 4.1.1 (the successor of PIL) in Python 3.5. The conversion between Pillow and numpy is straightforward.

from PIL import Image
import numpy as np
im = Image.open('1.jpg')
im2arr = np.array(im) # im2arr.shape: height x width x channel
arr2im = Image.fromarray(im2arr)

One thing that needs noticing is that Pillow-style im is column-major while numpy-style im2arr is row-major. However, the function Image.fromarray already takes this into consideration. That is, arr2im.size == im.size and arr2im.mode == im.mode in the above example.

We should take care of the HxWxC data format when processing the transformed numpy arrays, e.g. do the transform im2arr = np.rollaxis(im2arr, 2, 0) or im2arr = np.transpose(im2arr, (2, 0, 1)) into CxHxW format.


回答 3

您需要通过以下方式将图像转换为numpy数组:

import numpy
import PIL

img = PIL.Image.open("foo.jpg").convert("L")
imgarr = numpy.array(img) 

You need to convert your image to a numpy array this way:

import numpy
import PIL

img = PIL.Image.open("foo.jpg").convert("L")
imgarr = numpy.array(img) 

回答 4

我今天使用的示例:

import PIL
import numpy
from PIL import Image

def resize_image(numpy_array_image, new_height):
    # convert nympy array image to PIL.Image
    image = Image.fromarray(numpy.uint8(numpy_array_image))
    old_width = float(image.size[0])
    old_height = float(image.size[1])
    ratio = float( new_height / old_height)
    new_width = int(old_width * ratio)
    image = image.resize((new_width, new_height), PIL.Image.ANTIALIAS)
    # convert PIL.Image into nympy array back again
    return array(image)

The example, I have used today:

import PIL
import numpy
from PIL import Image

def resize_image(numpy_array_image, new_height):
    # convert nympy array image to PIL.Image
    image = Image.fromarray(numpy.uint8(numpy_array_image))
    old_width = float(image.size[0])
    old_height = float(image.size[1])
    ratio = float( new_height / old_height)
    new_width = int(old_width * ratio)
    image = image.resize((new_width, new_height), PIL.Image.ANTIALIAS)
    # convert PIL.Image into nympy array back again
    return array(image)

回答 5

如果图像以Blob格式(即数据库)存储,则可以使用Billal Begueradj解释的相同技术将图像从Blob转换为字节数组。

就我而言,我需要将图像存储在db表的blob列中:

def select_all_X_values(conn):
    cur = conn.cursor()
    cur.execute("SELECT ImageData from PiecesTable")    
    rows = cur.fetchall()    
    return rows

然后,我创建了一个辅助函数,将我的数据集更改为np.array:

X_dataset = select_all_X_values(conn)
imagesList = convertToByteIO(np.array(X_dataset))

def convertToByteIO(imagesArray):
    """
    # Converts an array of images into an array of Bytes
    """
    imagesList = []

    for i in range(len(imagesArray)):  
        img = Image.open(BytesIO(imagesArray[i])).convert("RGB")
        imagesList.insert(i, np.array(img))

    return imagesList

之后,我可以在神经网络中使用byteArrays了。

plt.imshow(imagesList[0])

If your image is stored in a Blob format (i.e. in a database) you can use the same technique explained by Billal Begueradj to convert your image from Blobs to a byte array.

In my case, I needed my images where stored in a blob column in a db table:

def select_all_X_values(conn):
    cur = conn.cursor()
    cur.execute("SELECT ImageData from PiecesTable")    
    rows = cur.fetchall()    
    return rows

I then created a helper function to change my dataset into np.array:

X_dataset = select_all_X_values(conn)
imagesList = convertToByteIO(np.array(X_dataset))

def convertToByteIO(imagesArray):
    """
    # Converts an array of images into an array of Bytes
    """
    imagesList = []

    for i in range(len(imagesArray)):  
        img = Image.open(BytesIO(imagesArray[i])).convert("RGB")
        imagesList.insert(i, np.array(img))

    return imagesList

After this, I was able to use the byteArrays in my Neural Network.

plt.imshow(imagesList[0])

回答 6

转换Numpy to PIL图像并PIL to Numpy

import numpy as np
from PIL import Image

def pilToNumpy(img):
    return np.array(img)

def NumpyToPil(img):
    return Image.fromarray(img)

Convert Numpy to PIL image and PIL to Numpy

import numpy as np
from PIL import Image

def pilToNumpy(img):
    return np.array(img)

def NumpyToPil(img):
    return Image.fromarray(img)

回答 7

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

您可以通过在压缩特征后将图像解析为numpy()函数来将图像转换为numpy(非规范化)

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

You can transform the image into numpy by parsing the image into numpy() function after squishing out the features( unnormalization)


numpy中的ndarray和array有什么区别?

问题:numpy中的ndarray和array有什么区别?

ndarrayarrayNumpy有什么区别?我在哪里可以找到numpy源代码中的实现?

What is the difference between ndarray and array in Numpy? And where can I find the implementations in the numpy source code?


回答 0

numpy.array只是创建一个便利函数ndarray; 它本身不是类。

您也可以使用创建数组numpy.ndarray,但不建议这样做。来自的文档字符串numpy.ndarray

阵列应该使用来构造arrayzerosempty…这里给出的参数是指低级方法(ndarray(...)用于实例化阵列)。

实现的大部分内容都在C代码中(在multiarray中),但是您可以在这里开始查看ndarray接口:

https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py

numpy.array is just a convenience function to create an ndarray; it is not a class itself.

You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:

Arrays should be constructed using array, zeros or empty … The parameters given here refer to a low-level method (ndarray(...)) for instantiating an array.

Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:

https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py


回答 1

numpy.array是一个返回的函数numpy.ndarray。没有对象类型numpy.array。

numpy.array is a function that returns a numpy.ndarray. There is no object type numpy.array.


回答 2

只需几行示例代码即可显示numpy.array和numpy.ndarray之间的区别

热身步骤:构建列表

a = [1,2,3]

检查类型

print(type(a))

你会得到

<class 'list'>

使用np.array构造一个数组(从列表中)

a = np.array(a)

或者,您可以跳过热身步骤,直接进行

a = np.array([1,2,3])

检查类型

print(type(a))

你会得到

<class 'numpy.ndarray'>

告诉你numpy数组的类型是numpy.ndarray

您还可以通过以下方式检查类型

isinstance(a, (np.ndarray))

你会得到

True

以下两行均会给您一条错误消息

np.ndarray(a)                # should be np.array(a)
isinstance(a, (np.array))    # should be isinstance(a, (np.ndarray))

Just a few lines of example code to show the difference between numpy.array and numpy.ndarray

Warm up step: Construct a list

a = [1,2,3]

Check the type

print(type(a))

You will get

<class 'list'>

Construct an array (from a list) using np.array

a = np.array(a)

Or, you can skip the warm up step, directly have

a = np.array([1,2,3])

Check the type

print(type(a))

You will get

<class 'numpy.ndarray'>

which tells you the type of the numpy array is numpy.ndarray

You can also check the type by

isinstance(a, (np.ndarray))

and you will get

True

Either of the following two lines will give you an error message

np.ndarray(a)                # should be np.array(a)
isinstance(a, (np.array))    # should be isinstance(a, (np.ndarray))

回答 3

numpy.ndarray()是一个类,numpy.array()而是要创建的方法/函数ndarray

在numpy docs中,如果您想从ndarray类创建数组,则可以使用以下两种方式进行引用:

1-使用array()zeros()empty()方法: 阵列应该使用数组,零或空构造(参考也参见下文部分)。此处给出的参数指的是ndarray(…)用于实例化数组的低级方法()。

2- ndarray直接来自类: 有两种创建数组的方式__new__:使用:如果buffer为None,则仅使用shape,dtype和order。如果buffer是暴露buffer接口的对象,则将解释所有关键字。

下面的示例给出了一个随机数组,因为我们没有分配缓冲区值:

np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)

array([[ -1.13698227e+002,   4.25087011e-303],
       [  2.88528414e-306,   3.27025015e-309]])         #random

另一个示例是将数组对象分配给缓冲区示例:

>>> np.ndarray((2,), buffer=np.array([1,2,3]),
...            offset=np.int_().itemsize,
...            dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])

从上面的示例中,我们注意到我们无法为“缓冲区”分配列表,我们不得不使用numpy.array()返回缓冲区的ndarray对象

结论:numpy.array()如果要制造numpy.ndarray()物体,则使用“

numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.

In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:

1- using array(), zeros() or empty() methods: Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.

2- from ndarray class directly: There are two modes of creating an array using __new__: If buffer is None, then only shape, dtype, and order are used. If buffer is an object exposing the buffer interface, then all keywords are interpreted.

The example below gives a random array because we didn’t assign buffer value:

np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)

array([[ -1.13698227e+002,   4.25087011e-303],
       [  2.88528414e-306,   3.27025015e-309]])         #random

another example is to assign array object to the buffer example:

>>> np.ndarray((2,), buffer=np.array([1,2,3]),
...            offset=np.int_().itemsize,
...            dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])

from above example we notice that we can’t assign a list to “buffer” and we had to use numpy.array() to return ndarray object for the buffer

Conclusion: use numpy.array() if you want to make a numpy.ndarray() object”


回答 4

我认为与np.array()您一起只能创建C,尽管您提到了该命令,但是当您使用np.isfortran()它检查时说是false。但是,np.ndarrray()当您指定订单时,它将根据提供的订单创建订单。

I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.


更好地协调两个numpy数组的更好方法

问题:更好地协调两个numpy数组的更好方法

我有两个不同形状的numpy数组,但是长度(引导尺寸)相同。我想对它们中的每一个进行混洗,以使相应的元素继续对应-即相对于它们的前导索引一致地对它们进行混洗。

该代码有效,并说明了我的目标:

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

例如:

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

但是,这感觉笨拙,效率低下且速度慢,并且需要复制数组-我宁愿就地对其进行随机播放,因为它们会很大。

有更好的方法来解决这个问题吗?更快的执行速度和更低的内存使用是我的主要目标,但是优美的代码也将是不错的。

我的另一个想法是:

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

这行得通…但是有点吓人,因为我看不到它会继续工作-例如,它看起来像不能在numpy版本中生存的那种东西。

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond — i.e. shuffle them in unison with respect to their leading indices.

This code works, and illustrates my goals:

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

For example:

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays — I’d rather shuffle them in-place, since they’ll be quite large.

Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.

One other thought I had was this:

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

This works…but it’s a little scary, as I see little guarantee it’ll continue to work — it doesn’t look like the sort of thing that’s guaranteed to survive across numpy version, for example.


回答 0

您的“吓人”解决方案对我来说似乎并不可怕。调用shuffle()两个相同长度的序列会导致对随机数生成器的调用次数相同,这是随机播放算法中唯一的“随机”元素。通过重置状态,可以确保对随机数生成器的调用将在对的第二次调用中给出相同的结果shuffle(),因此整个算法将生成相同的排列。

如果您不喜欢这种方法,那么另一种解决方案是将数据存储在一个数组中,而不是从一开始就存储在两个数组中,然后在此单个数组中创建两个视图以模拟您现在拥有的两个数组。您可以将单个数组用于改组,并将视图用于所有其他目的。

例如:假设数组ab这个样子的:

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

现在我们可以构造一个包含所有数据的数组:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

现在我们创建模拟原始视图的视图 a和的b

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

的数据 a2b2共享c。要同时混洗两个数组,请使用numpy.random.shuffle(c)

在生产代码,你当然会尽量避免创建原始ab根本,并马上创建ca2b2

该解决方案能够适应的情况下a,并b有不同的dtypes。

Your “scary” solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only “random” elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.

If you don’t like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

Example: Let’s assume the arrays a and b look like this:

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

We can now construct a single array containing all the data:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

Now we create views simulating the original a and b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).

In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.

This solution could be adapted to the case that a and b have different dtypes.


回答 1

您可以使用NumPy的数组索引

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

这将导致创建单独的统一重组的数组。

Your can use NumPy’s array indexing:

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

This will result in creation of separate unison-shuffled arrays.


回答 2

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

要了解更多信息,请参见http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html


回答 3

很简单的解决方案:

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

现在,两个数组x,y都以相同的方式随机洗牌

Very simple solution:

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

the two arrays x,y are now both randomly shuffled in the same way


回答 4

James在2015年编写了一个sklearn 解决方案,这很有帮助。但是他添加了一个不需要的随机状态变量。在下面的代码中,自动假定numpy为随机状态。

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

回答 5

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

回答 6

仅使用NumPy将任意数量的数组混合在一起就位。

import numpy as np


def shuffle_arrays(arrays, set_seed=-1):
    """Shuffles arrays in-place, in the same order, along axis=0

    Parameters:
    -----------
    arrays : List of NumPy arrays.
    set_seed : Seed value if int >= 0, else seed is random.
    """
    assert all(len(arr) == len(arrays[0]) for arr in arrays)
    seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed

    for arr in arrays:
        rstate = np.random.RandomState(seed)
        rstate.shuffle(arr)

可以这样使用

a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])

shuffle_arrays([a, b, c])

注意事项:

  • 该断言确保所有输入数组沿其第一维具有相同的长度。
  • 数组按其第一个维度在原地随机排列-没有返回任何内容。
  • int32正范围内的随机种子。
  • 如果需要重复播放,可以设置种子值。

随机播放后,可以np.split使用切片对数据进行拆分或使用切片进行引用-取决于应用程序。

Shuffle any number of arrays together, in-place, using only NumPy.

import numpy as np


def shuffle_arrays(arrays, set_seed=-1):
    """Shuffles arrays in-place, in the same order, along axis=0

    Parameters:
    -----------
    arrays : List of NumPy arrays.
    set_seed : Seed value if int >= 0, else seed is random.
    """
    assert all(len(arr) == len(arrays[0]) for arr in arrays)
    seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed

    for arr in arrays:
        rstate = np.random.RandomState(seed)
        rstate.shuffle(arr)

And can be used like this

a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])

shuffle_arrays([a, b, c])

A few things to note:

  • The assert ensures that all input arrays have the same length along their first dimension.
  • Arrays shuffled in-place by their first dimension – nothing returned.
  • Random seed within positive int32 range.
  • If a repeatable shuffle is needed, seed value can be set.

After the shuffle, the data can be split using np.split or referenced using slices – depending on the application.


回答 7

您可以制作一个像这样的数组:

s = np.arange(0, len(a), 1)

然后随机播放:

np.random.shuffle(s)

现在使用this作为数组的参数。相同的改组参数返回相同的改组向量。

x_data = x_data[s]
x_label = x_label[s]

you can make an array like:

s = np.arange(0, len(a), 1)

then shuffle it:

np.random.shuffle(s)

now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.

x_data = x_data[s]
x_label = x_label[s]

回答 8

可以对连接的列表执行就地改组的一种方法是使用种子(可以是随机的)并使用numpy.random.shuffle进行改组。

# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
   np.random.seed(seed)
   np.random.shuffle(a)
   np.random.seed(seed)
   np.random.shuffle(b)

而已。这将以完全相同的方式混洗a和b。这也就地完成,这总是一个优点。

编辑,不要使用np.random.seed()而是使用np.random.RandomState

def shuffle(a, b, seed):
   rand_state = np.random.RandomState(seed)
   rand_state.shuffle(a)
   rand_state.seed(seed)
   rand_state.shuffle(b)

调用它时,只需传入任何种子即可提供随机状态:

a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)

输出:

>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]

编辑:修复了重新设置随机状态的代码

One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.

# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
   np.random.seed(seed)
   np.random.shuffle(a)
   np.random.seed(seed)
   np.random.shuffle(b)

That’s it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.

EDIT, don’t use np.random.seed() use np.random.RandomState instead

def shuffle(a, b, seed):
   rand_state = np.random.RandomState(seed)
   rand_state.shuffle(a)
   rand_state.seed(seed)
   rand_state.shuffle(b)

When calling it just pass in any seed to feed the random state:

a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)

Output:

>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]

Edit: Fixed code to re-seed the random state


回答 9

有一个众所周知的函数可以处理此问题:

from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)

只需将test_size设置为0即可避免拆分,并为您提供随机数据。尽管它通常用于拆分训练数据和测试数据,但它的确也可以洗牌。
文档

将数组或矩阵拆分为随机训练和测试子集

快速实用程序,用于包装输入验证以及next(ShuffleSplit()。split(X,y))和应用程序,以将数据输入到单个调用中,以便在oneliner中拆分(以及可选地对子采样)数据。

There is a well-known function that can handle this:

from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)

Just setting test_size to 0 will avoid splitting and give you shuffled data. Though it is usually used to split train and test data, it does shuffle them too.
From documentation

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.


回答 10

假设我们有两个数组:a和b。

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]]) 

我们首先可以通过排列第一维来获得行索引

indices = np.random.permutation(a.shape[0])
[1 2 0]

然后使用高级索引。在这里,我们使用相同的索引来同时对两个数组进行混洗。

a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]

这相当于

np.take(a, indices, axis=0)
[[4 5 6]
 [7 8 9]
 [1 2 3]]

np.take(b, indices, axis=0)
[[6 6 6]
 [4 2 0]
 [9 1 1]]

Say we have two arrays: a and b.

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]]) 

We can first obtain row indices by permutating first dimension

indices = np.random.permutation(a.shape[0])
[1 2 0]

Then use advanced indexing. Here we are using the same indices to shuffle both arrays in unison.

a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]

This is equivalent to

np.take(a, indices, axis=0)
[[4 5 6]
 [7 8 9]
 [1 2 3]]

np.take(b, indices, axis=0)
[[6 6 6]
 [4 2 0]
 [9 1 1]]

回答 11

如果要避免复制数组,则建议不要遍历数组,而是遍历数组中的每个元素,然后将其随机交换到数组中的另一个位置

for old_index in len(a):
    new_index = numpy.random.randint(old_index+1)
    a[old_index], a[new_index] = a[new_index], a[old_index]
    b[old_index], b[new_index] = b[new_index], b[old_index]

这实现了Knuth-Fisher-Yates随机播放算法。

If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array

for old_index in len(a):
    new_index = numpy.random.randint(old_index+1)
    a[old_index], a[new_index] = a[new_index], a[old_index]
    b[old_index], b[new_index] = b[new_index], b[old_index]

This implements the Knuth-Fisher-Yates shuffle algorithm.


回答 12

这似乎是一个非常简单的解决方案:

import numpy as np
def shuffle_in_unison(a,b):

    assert len(a)==len(b)
    c = np.arange(len(a))
    np.random.shuffle(c)

    return a[c],b[c]

a =  np.asarray([[1, 1], [2, 2], [3, 3]])
b =  np.asarray([11, 22, 33])

shuffle_in_unison(a,b)
Out[94]: 
(array([[3, 3],
        [2, 2],
        [1, 1]]),
 array([33, 22, 11]))

This seems like a very simple solution:

import numpy as np
def shuffle_in_unison(a,b):

    assert len(a)==len(b)
    c = np.arange(len(a))
    np.random.shuffle(c)

    return a[c],b[c]

a =  np.asarray([[1, 1], [2, 2], [3, 3]])
b =  np.asarray([11, 22, 33])

shuffle_in_unison(a,b)
Out[94]: 
(array([[3, 3],
        [2, 2],
        [1, 1]]),
 array([33, 22, 11]))

回答 13

举个例子,这就是我在做什么:

combo = []
for i in range(60000):
    combo.append((images[i], labels[i]))

shuffle(combo)

im = []
lab = []
for c in combo:
    im.append(c[0])
    lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

With an example, this is what I’m doing:

combo = []
for i in range(60000):
    combo.append((images[i], labels[i]))

shuffle(combo)

im = []
lab = []
for c in combo:
    im.append(c[0])
    lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

回答 14

我扩展了python的random.shuffle()以获取第二个参数:

def shuffle_together(x, y):
    assert len(x) == len(y)

    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random.random() * (i+1))
        x[i], x[j] = x[j], x[i]
        y[i], y[j] = y[j], y[i]

这样,我可以确定改组发生在原位,并且函数不会太长或太复杂。

I extended python’s random.shuffle() to take a second arg:

def shuffle_together(x, y):
    assert len(x) == len(y)

    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random.random() * (i+1))
        x[i], x[j] = x[j], x[i]
        y[i], y[j] = y[j], y[i]

That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.


回答 15

只需使用 numpy

首先合并两个输入数组,一维数组是labels(y),二维数组是data(x),然后用NumPy shuffle方法将它们洗牌。最后将它们拆分并返回。

import numpy as np

def shuffle_2d(a, b):
    rows= a.shape[0]
    if b.shape != (rows,1):
        b = b.reshape((rows,1))
    S = np.hstack((b,a))
    np.random.shuffle(S)
    b, a  = S[:,0], S[:,1:]
    return a,b

features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

Just use numpy

First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.

import numpy as np

def shuffle_2d(a, b):
    rows= a.shape[0]
    if b.shape != (rows,1):
        b = b.reshape((rows,1))
    S = np.hstack((b,a))
    np.random.shuffle(S)
    b, a  = S[:,0], S[:,1:]
    return a,b

features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

将索引数组转换为1-hot编码的numpy数组

问题:将索引数组转换为1-hot编码的numpy数组

假设我有一个一维numpy数组

a = array([1,0,3])

我想将此编码为2d 1-hot数组

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

有快速的方法可以做到这一点吗?比循环遍历a设置元素更快b

Let’s say I have a 1d numpy array

a = array([1,0,3])

I would like to encode this as a 2D one-hot array

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

Is there a quick way to do this? Quicker than just looping over a to set elements of b, that is.


回答 0

您的数组a定义了输出数组中非零元素的列。您还需要定义行,然后使用花式索引:

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max()+1))
>>> b[np.arange(a.size),a] = 1
>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

Your array a defines the columns of the nonzero elements in the output array. You need to also define the rows and then use fancy indexing:

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max()+1))
>>> b[np.arange(a.size),a] = 1
>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

回答 1

>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])
>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

回答 2

如果您使用的是keras,则有一个内置实用程序:

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

它与@YXD的答案几乎相同(请参阅源代码)。

In case you are using keras, there is a built in utility for that:

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

And it does pretty much the same as @YXD’s answer (see source-code).


回答 3

这是我发现有用的:

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

num_classes代表您所拥有的类数量。因此,如果您拥有a形状为(10000,)的向量,则此函数会将其转换为(10000,C)。请注意,a是零索引,即one_hot(np.array([0, 1]), 2)会给[[1, 0], [0, 1]]

正是您想要的,我相信。

PS:来源是序列模型-deeplearning.ai

Here is what I find useful:

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

Here num_classes stands for number of classes you have. So if you have a vector with shape of (10000,) this function transforms it to (10000,C). Note that a is zero-indexed, i.e. one_hot(np.array([0, 1]), 2) will give [[1, 0], [0, 1]].

Exactly what you wanted to have I believe.

PS: the source is Sequence models – deeplearning.ai


回答 4

您可以使用 sklearn.preprocessing.LabelBinarizer

例:

import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))

输出:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

除其他事项外,您可以初始化sklearn.preprocessing.LabelBinarizer()以便的输出transform稀疏。

You can use sklearn.preprocessing.LabelBinarizer:

Example:

import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))

output:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

Amongst other things, you may initialize sklearn.preprocessing.LabelBinarizer() so that the output of transform is sparse.


回答 5

您还可以使用numpy的eye函数:

numpy.eye(number of classes)[vector containing the labels]

You can also use eye function of numpy:

numpy.eye(number of classes)[vector containing the labels]


回答 6

这是将一维矢量转换为一维二维热阵列的函数。

#!/usr/bin/env python
import numpy as np

def convertToOneHot(vector, num_classes=None):
    """
    Converts an input 1-D vector of integers into an output
    2-D array of one-hot vectors, where an i'th input value
    of j will set a '1' in the i'th row, j'th column of the
    output array.

    Example:
        v = np.array((1, 0, 4))
        one_hot_v = convertToOneHot(v)
        print one_hot_v

        [[0 1 0 0 0]
         [1 0 0 0 0]
         [0 0 0 0 1]]
    """

    assert isinstance(vector, np.ndarray)
    assert len(vector) > 0

    if num_classes is None:
        num_classes = np.max(vector)+1
    else:
        assert num_classes > 0
        assert num_classes >= np.max(vector)

    result = np.zeros(shape=(len(vector), num_classes))
    result[np.arange(len(vector)), vector] = 1
    return result.astype(int)

以下是一些用法示例:

>>> a = np.array([1, 0, 3])

>>> convertToOneHot(a)
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

>>> convertToOneHot(a, num_classes=10)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

Here is a function that converts a 1-D vector to a 2-D one-hot array.

#!/usr/bin/env python
import numpy as np

def convertToOneHot(vector, num_classes=None):
    """
    Converts an input 1-D vector of integers into an output
    2-D array of one-hot vectors, where an i'th input value
    of j will set a '1' in the i'th row, j'th column of the
    output array.

    Example:
        v = np.array((1, 0, 4))
        one_hot_v = convertToOneHot(v)
        print one_hot_v

        [[0 1 0 0 0]
         [1 0 0 0 0]
         [0 0 0 0 1]]
    """

    assert isinstance(vector, np.ndarray)
    assert len(vector) > 0

    if num_classes is None:
        num_classes = np.max(vector)+1
    else:
        assert num_classes > 0
        assert num_classes >= np.max(vector)

    result = np.zeros(shape=(len(vector), num_classes))
    result[np.arange(len(vector)), vector] = 1
    return result.astype(int)

Below is some example usage:

>>> a = np.array([1, 0, 3])

>>> convertToOneHot(a)
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

>>> convertToOneHot(a, num_classes=10)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

回答 7

对于1热编码

   one_hot_encode=pandas.get_dummies(array)

例如

享受编码

For 1-hot-encoding

   one_hot_encode=pandas.get_dummies(array)

For Example

ENJOY CODING


回答 8

我认为简短的答案是“否”。对于更通用的n尺寸,我想到了:

# For 2-dimensional data, 4 values
a = np.array([[0, 1, 2], [3, 2, 1]])
z = np.zeros(list(a.shape) + [4])
z[list(np.indices(z.shape[:-1])) + [a]] = 1

我想知道是否有更好的解决方案-我不喜欢我必须在最后两行中创建这些列表。无论如何,我使用进行了一些测量,timeit看来numpy基于-(indices/ arange)和迭代版本的性能大致相同。

I think the short answer is no. For a more generic case in n dimensions, I came up with this:

# For 2-dimensional data, 4 values
a = np.array([[0, 1, 2], [3, 2, 1]])
z = np.zeros(list(a.shape) + [4])
z[list(np.indices(z.shape[:-1])) + [a]] = 1

I am wondering if there is a better solution — I don’t like that I have to create those lists in the last two lines. Anyway, I did some measurements with timeit and it seems that the numpy-based (indices/arange) and the iterative versions perform about the same.


回答 9

只是在阐述出色答卷K3 — RNC,这里是一个更宽泛的版本:

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

此外,这里是这种方法的快速和肮脏的基准,并从一个方法目前公认的答案YXD(微变,让他们提供相同的API但后者只能与1D ndarrays):

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

后一种方法的速度提高了约35%(MacBook Pro 13 2015),但前一种方法更通用:

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Just to elaborate on the excellent answer from K3—rnc, here is a more generic version:

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

Also, here is a quick-and-dirty benchmark of this method and a method from the currently accepted answer by YXD (slightly changed, so that they offer the same API except that the latter works only with 1D ndarrays):

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

The latter method is ~35% faster (MacBook Pro 13 2015), but the former is more general:

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

回答 10

您可以使用以下代码将其转换为单热向量:

令x为具有单个列的普通类向量,该类具有从0到某个数字的类:

import numpy as np
np.eye(x.max()+1)[x]

如果0不是一个类;然后删除+1。

You can use the following code for converting into a one-hot vector:

let x is the normal class vector having a single column with classes 0 to some number:

import numpy as np
np.eye(x.max()+1)[x]

if 0 is not a class; then remove +1.


回答 11

我最近遇到了一个同类问题,发现上述解决方案只有在您的数字在一定范围内时才令人满意。例如,如果您要对以下列表进行一次热编码:

all_good_list = [0,1,2,3,4]

继续,上面已经提到过发布的解决方案。但是如果考虑这些数据怎么办:

problematic_list = [0,23,12,89,10]

如果使用上述方法进行操作,则可能会得到90个单柱色谱柱。这是因为所有答案都包含n = np.max(a)+1。我找到了一个更通用的解决方案,可以为我解决这个问题,并希望与您分享:

import numpy as np
import sklearn
sklb = sklearn.preprocessing.LabelBinarizer()
a = np.asarray([1,2,44,3,2])
n = np.unique(a)
sklb.fit(n)
b = sklb.transform(a)

我希望有人在上述解决方案上遇到同样的限制,并且这可能派上用场

I recently ran into a problem of same kind and found said solution which turned out to be only satisfying if you have numbers that go within a certain formation. For example if you want to one-hot encode following list:

all_good_list = [0,1,2,3,4]

go ahead, the posted solutions are already mentioned above. But what if considering this data:

problematic_list = [0,23,12,89,10]

If you do it with methods mentioned above, you will likely end up with 90 one-hot columns. This is because all answers include something like n = np.max(a)+1. I found a more generic solution that worked out for me and wanted to share with you:

import numpy as np
import sklearn
sklb = sklearn.preprocessing.LabelBinarizer()
a = np.asarray([1,2,44,3,2])
n = np.unique(a)
sklb.fit(n)
b = sklb.transform(a)

I hope someone encountered same restrictions on above solutions and this might come in handy


回答 12

这种编码类型通常是numpy数组的一部分。如果您使用这样的numpy数组:

a = np.array([1,0,3])

那么有一种非常简单的方法可以将其转换为1-hot编码

out = (np.arange(4) == a[:,None]).astype(np.float32)

而已。

Such type of encoding are usually part of numpy array. If you are using a numpy array like this :

a = np.array([1,0,3])

then there is very simple way to convert that to 1-hot encoding

out = (np.arange(4) == a[:,None]).astype(np.float32)

That’s it.


回答 13

  • p将是一个二维数组。
  • 我们想知道哪个值是连续最高的,在那放置1,在其他任何地方放置0。

清洁简便的解决方案:

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1)
one_hot = np.zeros(p.shape)
np.put_along_axis(one_hot, max_elements_i, 1, axis=1)
  • p will be a 2d ndarray.
  • We want to know which value is the highest in a row, to put there 1 and everywhere else 0.

clean and easy solution:

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1)
one_hot = np.zeros(p.shape)
np.put_along_axis(one_hot, max_elements_i, 1, axis=1)

回答 14

使用Neuraxle管道步骤:

  1. 建立你的例子
import numpy as np
a = np.array([1,0,3])
b = np.array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])
  1. 做实际的转换
from neuraxle.steps.numpy import OneHotEncoder
encoder = OneHotEncoder(nb_columns=4)
b_pred = encoder.transform(a)
  1. 断言有效
assert b_pred == b

链接到文档:neuraxle.steps.numpy.OneHotEncoder

Using a Neuraxle pipeline step:

  1. Set up your example
import numpy as np
a = np.array([1,0,3])
b = np.array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])
  1. Do the actual conversion
from neuraxle.steps.numpy import OneHotEncoder
encoder = OneHotEncoder(nb_columns=4)
b_pred = encoder.transform(a)
  1. Assert it works
assert b_pred == b

Link to documentation: neuraxle.steps.numpy.OneHotEncoder


回答 15

这是我根据上述答案和自己的用例编写的一个示例函数:

def label_vector_to_one_hot_vector(vector, one_hot_size=10):
    """
    Use to convert a column vector to a 'one-hot' matrix

    Example:
        vector: [[2], [0], [1]]
        one_hot_size: 3
        returns:
            [[ 0.,  0.,  1.],
             [ 1.,  0.,  0.],
             [ 0.,  1.,  0.]]

    Parameters:
        vector (np.array): of size (n, 1) to be converted
        one_hot_size (int) optional: size of 'one-hot' row vector

    Returns:
        np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix
    """
    squeezed_vector = np.squeeze(vector, axis=-1)

    one_hot = np.zeros((squeezed_vector.size, one_hot_size))

    one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1

    return one_hot

label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

Here is an example function that I wrote to do this based upon the answers above and my own use case:

def label_vector_to_one_hot_vector(vector, one_hot_size=10):
    """
    Use to convert a column vector to a 'one-hot' matrix

    Example:
        vector: [[2], [0], [1]]
        one_hot_size: 3
        returns:
            [[ 0.,  0.,  1.],
             [ 1.,  0.,  0.],
             [ 0.,  1.,  0.]]

    Parameters:
        vector (np.array): of size (n, 1) to be converted
        one_hot_size (int) optional: size of 'one-hot' row vector

    Returns:
        np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix
    """
    squeezed_vector = np.squeeze(vector, axis=-1)

    one_hot = np.zeros((squeezed_vector.size, one_hot_size))

    one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1

    return one_hot

label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

回答 16

为了添加完整的功能,我仅使用numpy运算符:

   def probs_to_onehot(output_probabilities):
        argmax_indices_array = np.argmax(output_probabilities, axis=1)
        onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)]
        return onehot_output_array

它以概率矩阵作为输入:例如:

[[0.03038822 0.65810204 0.16549407 0.3797123] … [0.02771272 0.2760752 0.3280924 0.33458805]

它将返回

[[0 1 0 0] … [0 0 0 1]]

I am adding for completion a simple function, using only numpy operators:

   def probs_to_onehot(output_probabilities):
        argmax_indices_array = np.argmax(output_probabilities, axis=1)
        onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)]
        return onehot_output_array

It takes as input a probability matrix: e.g.:

[[0.03038822 0.65810204 0.16549407 0.3797123 ] … [0.02771272 0.2760752 0.3280924 0.33458805]]

And it will return

[[0 1 0 0] … [0 0 0 1]]


回答 17

这是一个与维数无关的独立解决方案。

这会将arr非负整数的任何N维数组转换为一维N + 1维数组one_hot,其中one_hot[i_1,...,i_N,c] = 1means arr[i_1,...,i_N] = c。您可以通过以下方式恢复输入np.argmax(one_hot, -1)

def expand_integer_grid(arr, n_classes):
    """

    :param arr: N dim array of size i_1, ..., i_N
    :param n_classes: C
    :returns: one-hot N+1 dim array of size i_1, ..., i_N, C
    :rtype: ndarray

    """
    one_hot = np.zeros(arr.shape + (n_classes,))
    axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)]
    flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')]
    one_hot[flat_grids + [arr.ravel()]] = 1
    assert((one_hot.sum(-1) == 1).all())
    assert(np.allclose(np.argmax(one_hot, -1), arr))
    return one_hot

Here’s a dimensionality-independent standalone solution.

This will convert any N-dimensional array arr of nonnegative integers to a one-hot N+1-dimensional array one_hot, where one_hot[i_1,...,i_N,c] = 1 means arr[i_1,...,i_N] = c. You can recover the input via np.argmax(one_hot, -1)

def expand_integer_grid(arr, n_classes):
    """

    :param arr: N dim array of size i_1, ..., i_N
    :param n_classes: C
    :returns: one-hot N+1 dim array of size i_1, ..., i_N, C
    :rtype: ndarray

    """
    one_hot = np.zeros(arr.shape + (n_classes,))
    axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)]
    flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')]
    one_hot[flat_grids + [arr.ravel()]] = 1
    assert((one_hot.sum(-1) == 1).all())
    assert(np.allclose(np.argmax(one_hot, -1), arr))
    return one_hot

回答 18

使用以下代码。效果最好。

def one_hot_encode(x):
"""
    argument
        - x: a list of labels
    return
        - one hot encoding matrix (number of labels, number of class)
"""
encoded = np.zeros((len(x), 10))

for idx, val in enumerate(x):
    encoded[idx][val] = 1

return encoded

在这里找到它 PS无需进入链接。

Use the following code. It works best.

def one_hot_encode(x):
"""
    argument
        - x: a list of labels
    return
        - one hot encoding matrix (number of labels, number of class)
"""
encoded = np.zeros((len(x), 10))

for idx, val in enumerate(x):
    encoded[idx][val] = 1

return encoded

Found it here P.S You don’t need to go into the link.


连接两个一维NumPy数组

问题:连接两个一维NumPy数组

我在NumPy中有两个简单的一维数组。我应该能够使用numpy.concatenate将它们连接起来。但是我收到以下代码的错误:

TypeError:只有length-1数组可以转换为Python标量

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

为什么?

I have two simple one-dimensional arrays in NumPy. I should be able to concatenate them using numpy.concatenate. But I get this error for the code below:

TypeError: only length-1 arrays can be converted to Python scalars

Code

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

Why?


回答 0

该行应为:

numpy.concatenate([a,b])

要连接的数组需要作为一个序列而不是作为单独的参数传递。

NumPy文档中

numpy.concatenate((a1, a2, ...), axis=0)

将一系列数组连接在一起。

它试图将您解释b为axis参数,这就是为什么它抱怨无法将其转换为标量。

The line should be:

numpy.concatenate([a,b])

The arrays you want to concatenate need to be passed in as a sequence, not as separate arguments.

From the NumPy documentation:

numpy.concatenate((a1, a2, ...), axis=0)

Join a sequence of arrays together.

It was trying to interpret your b as the axis parameter, which is why it complained it couldn’t convert it into a scalar.


回答 1

连接一维数组有多种可能性,例如,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

对于大型阵列,所有这些选项都同样快。对于小型的,concatenate有一点优势:

在此处输入图片说明

该图是使用perfplot创建的:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

There are several possibilities for concatenating 1D arrays, e.g.,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

All those options are equally fast for large arrays; for small ones, concatenate has a slight edge:

enter image description here

The plot was created with perfplot:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

回答 2

的第一个参数concatenate本身应该是要串联的数组序列

numpy.concatenate((a,b)) # Note the extra parentheses.

The first parameter to concatenate should itself be a sequence of arrays to concatenate:

numpy.concatenate((a,b)) # Note the extra parentheses.

回答 3

另一种方法是使用“ concatenate”的缩写形式,即“ r _ […]”或“ c _ […]”,如下面的示例代码所示(请参见http://wiki.scipy.org / NumPy_for_Matlab_Users以获取更多信息):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

结果是:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

An alternative ist to use the short form of “concatenate” which is either “r_[…]” or “c_[…]” as shown in the example code beneath (see http://wiki.scipy.org/NumPy_for_Matlab_Users for additional information):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

Which results in:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

回答 4

以下是使用numpy.ravel(),的更多方法:numpy.array()利用一维数组可以解包为普通元素的事实:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

Here are more approaches for doing this by using numpy.ravel(), numpy.array(), utilizing the fact that 1D arrays can be unpacked into plain elements:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

回答 5

来自numpy 文档的更多事实:

语法为 numpy.concatenate((a1, a2, ...), axis=0, out=None)

轴= 0用于行连接轴= 1用于列连接

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

希望对您有所帮助!

Some more facts from the numpy docs :

With syntax as numpy.concatenate((a1, a2, ...), axis=0, out=None)

axis = 0 for row-wise concatenation axis = 1 for column-wise concatenation

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

I hope it helps !


Python / NumPy中的meshgrid的用途是什么?

问题:Python / NumPy中的meshgrid的用途是什么?

有人可以向我解释meshgridNumpy 中功能的目的是什么?我知道它会为绘图创建某种坐标网格,但是我真的看不到它的直接好处。

我正在研究Sebastian Raschka的“ Python机器学习”,他正在使用它来绘制决策边界。请参阅此处的输入11 。

我也从官方文档中尝试过此代码,但是再次,输出对我来说真的没有意义。

x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True)
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
h = plt.contourf(x,y,z)

请,如果可能的话,还请给我展示很多真实的例子。

Can someone explain to me what is the purpose of meshgrid function in Numpy? I know it creates some kind of grid of coordinates for plotting, but I can’t really see the direct benefit of it.

I am studying “Python Machine Learning” from Sebastian Raschka, and he is using it for plotting the decision borders. See input 11 here.

I have also tried this code from official documentation, but, again, the output doesn’t really make sense to me.

x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True)
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
h = plt.contourf(x,y,z)

Please, if possible, also show me a lot of real-world examples.


回答 0

目的meshgrid是根据x值数组和y值数组创建矩形网格。

因此,例如,如果我们要创建一个网格,在x和y方向上每个介于0和4之间的整数值处都有一个点。要创建矩形网格,我们需要xy点的每个组合。

这将是25分,对吧?因此,如果我们想为所有这些点创建一个x和y数组,则可以执行以下操作。

x[0,0] = 0    y[0,0] = 0
x[0,1] = 1    y[0,1] = 0
x[0,2] = 2    y[0,2] = 0
x[0,3] = 3    y[0,3] = 0
x[0,4] = 4    y[0,4] = 0
x[1,0] = 0    y[1,0] = 1
x[1,1] = 1    y[1,1] = 1
...
x[4,3] = 3    y[4,3] = 4
x[4,4] = 4    y[4,4] = 4

这将导致以下xy矩阵,使得每个矩阵中对应元素的配对给出网格中一个点的x和y坐标。

x =   0 1 2 3 4        y =   0 0 0 0 0
      0 1 2 3 4              1 1 1 1 1
      0 1 2 3 4              2 2 2 2 2
      0 1 2 3 4              3 3 3 3 3
      0 1 2 3 4              4 4 4 4 4

然后,我们可以绘制这些图形以验证它们是否为网格:

plt.plot(x,y, marker='.', color='k', linestyle='none')

在此处输入图片说明

显然,这特别繁琐,尤其对于x和的范围y。相反,meshgrid实际上可以为我们生成此代码:我们只需指定唯一值xy值即可。

xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);

现在,当我们调用时meshgrid,我们将自动获得先前的输出。

xx, yy = np.meshgrid(xvalues, yvalues)

plt.plot(xx, yy, marker='.', color='k', linestyle='none')

在此处输入图片说明

这些矩形网格的创建对于许多任务很有用。在您的帖子中提供的示例中,这只是一种sin(x**2 + y**2) / (x**2 + y**2)x和的值范围内对函数()进行采样的方法y

由于此函数已在矩形网格上采样,因此现在可以将其可视化为“图像”。

在此处输入图片说明

此外,现在可以将结果传递给期望在矩形网格上获得数据的函数(例如contourf

The purpose of meshgrid is to create a rectangular grid out of an array of x values and an array of y values.

So, for example, if we want to create a grid where we have a point at each integer value between 0 and 4 in both the x and y directions. To create a rectangular grid, we need every combination of the x and y points.

This is going to be 25 points, right? So if we wanted to create an x and y array for all of these points, we could do the following.

x[0,0] = 0    y[0,0] = 0
x[0,1] = 1    y[0,1] = 0
x[0,2] = 2    y[0,2] = 0
x[0,3] = 3    y[0,3] = 0
x[0,4] = 4    y[0,4] = 0
x[1,0] = 0    y[1,0] = 1
x[1,1] = 1    y[1,1] = 1
...
x[4,3] = 3    y[4,3] = 4
x[4,4] = 4    y[4,4] = 4

This would result in the following x and y matrices, such that the pairing of the corresponding element in each matrix gives the x and y coordinates of a point in the grid.

x =   0 1 2 3 4        y =   0 0 0 0 0
      0 1 2 3 4              1 1 1 1 1
      0 1 2 3 4              2 2 2 2 2
      0 1 2 3 4              3 3 3 3 3
      0 1 2 3 4              4 4 4 4 4

We can then plot these to verify that they are a grid:

plt.plot(x,y, marker='.', color='k', linestyle='none')

enter image description here

Obviously, this gets very tedious especially for large ranges of x and y. Instead, meshgrid can actually generate this for us: all we have to specify are the unique x and y values.

xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);

Now, when we call meshgrid, we get the previous output automatically.

xx, yy = np.meshgrid(xvalues, yvalues)

plt.plot(xx, yy, marker='.', color='k', linestyle='none')

enter image description here

Creation of these rectangular grids is useful for a number of tasks. In the example that you have provided in your post, it is simply a way to sample a function (sin(x**2 + y**2) / (x**2 + y**2)) over a range of values for x and y.

Because this function has been sampled on a rectangular grid, the function can now be visualized as an “image”.

enter image description here

Additionally, the result can now be passed to functions which expect data on rectangular grid (i.e. contourf)


回答 1

由Microsoft Excel提供: 

在此处输入图片说明

Courtesy of Microsoft Excel: 

enter image description here


回答 2

实际上np.meshgrid,文档中已经提到的目的:

np.meshgrid

从坐标向量返回坐标矩阵。

给定一维坐标数组x1,x2,…,xn,制作ND坐标数组以对ND网格上的ND标量/矢量场进行矢量化评估。

因此,其主要目的是创建坐标矩阵。

您可能只是问自己:

为什么我们需要创建坐标矩阵?

使用Python / NumPy需要坐标矩阵的原因是,从坐标到值没有直接关系,除非坐标从零开始并且是纯正整数。然后,您可以只使用数组的索引作为索引。但是,如果不是这种情况,则您需要以某种方式将坐标存储在数据旁边。那就是网格进来的地方。

假设您的数据是:

1  2  1
2  5  2
1  2  1

但是,每个值代表一个水平2公里,垂直3公里的区域。假设您的原点是左上角,并且您想要一个表示可以使用的距离的数组:

import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)

其中v是:

array([[0, 0, 0],
       [2, 2, 2],
       [4, 4, 4]])

和h:

array([[0, 3, 6],
       [0, 3, 6],
       [0, 3, 6]])

所以,如果你有两个指标,比方说xy(这就是为什么的返回值meshgrid通常是xxxs,而不是x在这种情况下,我选择h了水平!),那么你可以得到该点的x坐标,在Y点和坐标通过使用以下方法在那时的价值:

h[x, y]    # horizontal coordinate
v[x, y]    # vertical coordinate
data[x, y]  # value

这样可以更轻松地跟踪坐标(更重要的是)您可以将其传递给需要知道坐标的函数。

稍长的解释

但是,np.meshgrid本身并不经常直接使用,大多数人只是使用类似对象np.mgrid或中的一种np.ogrid。这里np.mgrid代表sparse=Falsenp.ogridsparse=True情况(我指的是的sparse参数np.meshgrid)。请注意,np.meshgridnp.ogrid和之间有显着差异 np.mgrid:返回的前两个值(如果有两个或多个)将颠倒。通常这无关紧要,但是您应该根据上下文提供有意义的变量名称。

例如,在2D网格的情况下,matplotlib.pyplot.imshow命名第一个返回的项np.meshgrid x和第二个返回项是有意义的,y而对于np.mgrid和则相反np.ogrid

np.ogrid 和稀疏的网格

>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

正如已经说过的,与相比np.meshgrid,输出是相反的,这就是为什么我解压缩yy, xx而不是xx, yy

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

这已经看起来像座标,特别是2D图的x和y线。

可视化:

yy, xx = np.ogrid[-5:6, -5:6]
plt.figure()
plt.title('ogrid (sparse meshgrid)')
plt.grid()
plt.xticks(xx.ravel())
plt.yticks(yy.ravel())
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")

在此处输入图片说明

np.mgrid 和密集/充实的网格

>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

此处同样适用:与相比,输出反转np.meshgrid

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

ogrid这些数组不同的是,它们在-5 <= xx <= 5中包含所有 xxyy坐标;-5 <= yy <= 5格。

yy, xx = np.mgrid[-5:6, -5:6]
plt.figure()
plt.title('mgrid (dense meshgrid)')
plt.grid()
plt.xticks(xx[0])
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")

在此处输入图片说明

功能性

它不仅限于二维,而且这些函数适用于任意尺寸(嗯,Python中为函数提供了最大数量的参数,而NumPy允许最大数量的尺寸):

>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
x1
array([[[[0]]],


       [[[1]]],


       [[[2]]]])
x2
array([[[[1]],

        [[2]],

        [[3]]]])
x3
array([[[[2],
         [3],
         [4]]]])
x4
array([[[[3, 4, 5]]]])

>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
# Identical output so it's omitted here.

即使这些也适用于一维,也有两个(更为常见的)一维网格创建功能:

除了startand stop参数外,它还支持step参数(即使是代表步骤数的复杂步骤):

>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1  # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [7., 7., 7., 7.],
       [9., 9., 9., 9.]])
>>> x2  # The dimension with the "number of steps"
array([[ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.]])

应用领域

您专门询问了目的,实际上,如果需要坐标系,这些网格非常有用。

例如,如果您有一个NumPy函数,它可以在两个维度上计算距离:

def distance_2d(x_point, y_point, x, y):
    return np.hypot(x-x_point, y-y_point)

您想知道每个点的距离:

>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys)  # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
        7.07106781, 7.        , 7.07106781, 7.28010989, 7.61577311],
       [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
        6.08276253, 6.        , 6.08276253, 6.32455532, 6.70820393],
       [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
        5.09901951, 5.        , 5.09901951, 5.38516481, 5.83095189],
       [7.21110255, 6.40312424, 5.65685425, 5.        , 4.47213595,
        4.12310563, 4.        , 4.12310563, 4.47213595, 5.        ],
       [6.70820393, 5.83095189, 5.        , 4.24264069, 3.60555128,
        3.16227766, 3.        , 3.16227766, 3.60555128, 4.24264069],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.        , 5.        , 4.        , 3.        , 2.        ,
        1.        , 0.        , 1.        , 2.        , 3.        ],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128]])

如果通过密集网格而不是开放网格,则输出将是相同的。NumPys广播使之成为可能!

让我们可视化结果:

plt.figure()
plt.title('distance to point (1, 2)')
plt.imshow(distances, origin='lower', interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel())  # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())
plt.colorbar()

在此处输入图片说明

这也是当NumPys mgridogrid变得非常方便,因为它可以让你轻松更改网格的分辨率:

ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above

在此处输入图片说明

但是,由于imshow不支持xy输入,因此必须手动更改报价。如果接受x和,这将非常方便。y坐标,对吗?

使用NumPy编写自然处理网格的函数很容易。此外,NumPy,SciPy,matplotlib中有几个函数希望您传递到网格中。

我喜欢图片,因此让我们来探索一下matplotlib.pyplot.contour

ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.figure()
plt.contour(xs, ys, density)

在此处输入图片说明

注意如何正确设置坐标!如果您只是传入,则不会是这种情况density

或举另一个使用天体模型的有趣示例(这次我不太在乎坐标,我只是使用它们来创建一些网格):

from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
    g2d = models.Gaussian2D(amplitude=100, 
                           x_mean=np.random.randint(0, 100), 
                           y_mean=np.random.randint(0, 100), 
                           x_stddev=3, 
                           y_stddev=3)
    z += g2d(x, y)
    a2d = models.AiryDisk2D(amplitude=70, 
                            x_0=np.random.randint(0, 100), 
                            y_0=np.random.randint(0, 100), 
                            radius=5)
    z += a2d(x, y)

在此处输入图片说明

尽管那只是为了“外观”,但与Scipy中的功能模型和拟合(例如scipy.interpolate.interp2dscipy.interpolate.griddata甚至显示示例使用np.mgrid)相关的一些功能仍需要网格。其中大多数都适用于开放式网格和密集型网格,但是有些仅适用于其中之一。

Actually the purpose of np.meshgrid is already mentioned in the documentation:

np.meshgrid

Return coordinate matrices from coordinate vectors.

Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,…, xn.

So it’s primary purpose is to create a coordinates matrices.

You probably just asked yourself:

Why do we need to create coordinate matrices?

The reason you need coordinate matrices with Python/NumPy is that there is no direct relation from coordinates to values, except when your coordinates start with zero and are purely positive integers. Then you can just use the indices of an array as the index. However when that’s not the case you somehow need to store coordinates alongside your data. That’s where grids come in.

Suppose your data is:

1  2  1
2  5  2
1  2  1

However, each value represents a 2 kilometers wide region horizontally and 3 kilometers vertically. Suppose your origin is the upper left corner and you want arrays that represent the distance you could use:

import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)

where v is:

array([[0, 0, 0],
       [2, 2, 2],
       [4, 4, 4]])

and h:

array([[0, 3, 6],
       [0, 3, 6],
       [0, 3, 6]])

So if you have two indices, let’s say x and y (that’s why the return value of meshgrid is usually xx or xs instead of x in this case I chose h for horizontally!) then you can get the x coordinate of the point, the y coordinate of the point and the value at that point by using:

h[x, y]    # horizontal coordinate
v[x, y]    # vertical coordinate
data[x, y]  # value

That makes it much easier to keep track of coordinates and (even more importantly) you can pass them to functions that need to know the coordinates.

A slightly longer explanation

However, np.meshgrid itself isn’t often used directly, mostly one just uses one of similar objects np.mgrid or np.ogrid. Here np.mgrid represents the sparse=False and np.ogrid the sparse=True case (I refer to the sparse argument of np.meshgrid). Note that there is a significant difference between np.meshgrid and np.ogrid and np.mgrid: The first two returned values (if there are two or more) are reversed. Often this doesn’t matter but you should give meaningful variable names depending on the context.

For example, in case of a 2D grid and matplotlib.pyplot.imshow it makes sense to name the first returned item of np.meshgrid x and the second one y while it’s the other way around for np.mgrid and np.ogrid.

np.ogrid and sparse grids

>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])
       

As already said the output is reversed when compared to np.meshgrid, that’s why I unpacked it as yy, xx instead of xx, yy:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

This already looks like coordinates, specifically the x and y lines for 2D plots.

Visualized:

yy, xx = np.ogrid[-5:6, -5:6]
plt.figure()
plt.title('ogrid (sparse meshgrid)')
plt.grid()
plt.xticks(xx.ravel())
plt.yticks(yy.ravel())
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")

enter image description here

np.mgrid and dense/fleshed out grids

>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])
       

The same applies here: The output is reversed compared to np.meshgrid:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])
       

Unlike ogrid these arrays contain all xx and yy coordinates in the -5 <= xx <= 5; -5 <= yy <= 5 grid.

yy, xx = np.mgrid[-5:6, -5:6]
plt.figure()
plt.title('mgrid (dense meshgrid)')
plt.grid()
plt.xticks(xx[0])
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")

enter image description here

Functionality

It’s not only limited to 2D, these functions work for arbitrary dimensions (well, there is a maximum number of arguments given to function in Python and a maximum number of dimensions that NumPy allows):

>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
x1
array([[[[0]]],


       [[[1]]],


       [[[2]]]])
x2
array([[[[1]],

        [[2]],

        [[3]]]])
x3
array([[[[2],
         [3],
         [4]]]])
x4
array([[[[3, 4, 5]]]])

>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
# Identical output so it's omitted here.

Even if these also work for 1D there are two (much more common) 1D grid creation functions:

Besides the start and stop argument it also supports the step argument (even complex steps that represent the number of steps):

>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1  # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [7., 7., 7., 7.],
       [9., 9., 9., 9.]])
>>> x2  # The dimension with the "number of steps"
array([[ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.]])
       

Applications

You specifically asked about the purpose and in fact, these grids are extremely useful if you need a coordinate system.

For example if you have a NumPy function that calculates the distance in two dimensions:

def distance_2d(x_point, y_point, x, y):
    return np.hypot(x-x_point, y-y_point)
    

And you want to know the distance of each point:

>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys)  # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
        7.07106781, 7.        , 7.07106781, 7.28010989, 7.61577311],
       [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
        6.08276253, 6.        , 6.08276253, 6.32455532, 6.70820393],
       [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
        5.09901951, 5.        , 5.09901951, 5.38516481, 5.83095189],
       [7.21110255, 6.40312424, 5.65685425, 5.        , 4.47213595,
        4.12310563, 4.        , 4.12310563, 4.47213595, 5.        ],
       [6.70820393, 5.83095189, 5.        , 4.24264069, 3.60555128,
        3.16227766, 3.        , 3.16227766, 3.60555128, 4.24264069],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.        , 5.        , 4.        , 3.        , 2.        ,
        1.        , 0.        , 1.        , 2.        , 3.        ],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128]])
        

The output would be identical if one passed in a dense grid instead of an open grid. NumPys broadcasting makes it possible!

Let’s visualize the result:

plt.figure()
plt.title('distance to point (1, 2)')
plt.imshow(distances, origin='lower', interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel())  # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())
plt.colorbar()

enter image description here

And this is also when NumPys mgrid and ogrid become very convenient because it allows you to easily change the resolution of your grids:

ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above

enter image description here

However, since imshow doesn’t support x and y inputs one has to change the ticks by hand. It would be really convenient if it would accept the x and y coordinates, right?

It’s easy to write functions with NumPy that deal naturally with grids. Furthermore, there are several functions in NumPy, SciPy, matplotlib that expect you to pass in the grid.

I like images so let’s explore matplotlib.pyplot.contour:

ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.figure()
plt.contour(xs, ys, density)

enter image description here

Note how the coordinates are already correctly set! That wouldn’t be the case if you just passed in the density.

Or to give another fun example using astropy models (this time I don’t care much about the coordinates, I just use them to create some grid):

from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
    g2d = models.Gaussian2D(amplitude=100, 
                           x_mean=np.random.randint(0, 100), 
                           y_mean=np.random.randint(0, 100), 
                           x_stddev=3, 
                           y_stddev=3)
    z += g2d(x, y)
    a2d = models.AiryDisk2D(amplitude=70, 
                            x_0=np.random.randint(0, 100), 
                            y_0=np.random.randint(0, 100), 
                            radius=5)
    z += a2d(x, y)
    

enter image description here

Although that’s just “for the looks” several functions related to functional models and fitting (for example scipy.interpolate.interp2d, scipy.interpolate.griddata even show examples using np.mgrid) in Scipy, etc. require grids. Most of these work with open grids and dense grids, however some only work with one of them.


回答 3

假设您有一个功能:

def sinus2d(x, y):
    return np.sin(x) + np.sin(y)

例如,您想要查看0到2 * pi范围内的图像。你会怎么做?有np.meshgrid来自于:

xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
z = sinus2d(xx, yy) # Create the image on this grid

这样的情节看起来像:

import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', interpolation='none')
plt.show()

在此处输入图片说明

所以np.meshgrid只是一个方便。原则上,可以通过以下方式完成此操作:

z2 = sinus2d(np.linspace(0,2*np.pi,100)[:,None], np.linspace(0,2*np.pi,100)[None,:])

但是您需要了解自己的尺寸(假设您有两个以上…)和正确的广播。np.meshgrid为您完成所有这一切。

此外,例如,如果您想进行插值但排除某些值,则meshgrid允许您将坐标与数据一起删除:

condition = z>0.6
z_new = z[condition] # This will make your array 1D

那么您现在将如何进行插值?你可以给xy一个插值函数,scipy.interpolate.interp2d因此您需要一种方法来知道删除了哪些坐标:

x_new = xx[condition]
y_new = yy[condition]

然后您仍然可以使用“正确的”坐标进行插值(在没有网状网格的情况下进行尝试,您将获得很多额外的代码):

from scipy.interpolate import interp2d
interpolated = interp2d(x_new, y_new, z_new)

并且原始网格物体允许您再次在原始网格物体上获得插值:

interpolated_grid = interpolated(xx[0], yy[:, 0]).reshape(xx.shape)

这些只是我使用过的一些示例,meshgrid可能还会更多。

Suppose you have a function:

def sinus2d(x, y):
    return np.sin(x) + np.sin(y)

and you want, for example, to see what it looks like in the range 0 to 2*pi. How would you do it? There np.meshgrid comes in:

xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
z = sinus2d(xx, yy) # Create the image on this grid

and such a plot would look like:

import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', interpolation='none')
plt.show()

enter image description here

So np.meshgrid is just a convenience. In principle the same could be done by:

z2 = sinus2d(np.linspace(0,2*np.pi,100)[:,None], np.linspace(0,2*np.pi,100)[None,:])

but there you need to be aware of your dimensions (suppose you have more than two …) and the right broadcasting. np.meshgrid does all of this for you.

Also meshgrid allows you to delete coordinates together with the data if you, for example, want to do an interpolation but exclude certain values:

condition = z>0.6
z_new = z[condition] # This will make your array 1D

so how would you do the interpolation now? You can give x and y to an interpolation function like scipy.interpolate.interp2d so you need a way to know which coordinates were deleted:

x_new = xx[condition]
y_new = yy[condition]

and then you can still interpolate with the “right” coordinates (try it without the meshgrid and you will have a lot of extra code):

from scipy.interpolate import interp2d
interpolated = interp2d(x_new, y_new, z_new)

and the original meshgrid allows you to get the interpolation on the original grid again:

interpolated_grid = interpolated(xx[0], yy[:, 0]).reshape(xx.shape)

These are just some examples where I used the meshgrid there might be a lot more.


回答 4

meshgrid帮助从两个数组的所有成对点的两个一维数组中创建一个矩形网格。

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 2, 3, 4])

现在,如果您定义了函数f(x,y),并且想将此函数应用于数组’x’和’y’的所有可能的点组合,则可以执行以下操作:

f(*np.meshgrid(x, y))

假设,如果您的函数仅产生两个元素的乘积,那么这就是可以有效地为大型数组获得笛卡尔积的方法。

这里引用

meshgrid helps in creating a rectangular grid from two 1-D arrays of all pairs of points from the two arrays.

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 2, 3, 4])

Now, if you have defined a function f(x,y) and you wanna apply this function to all the possible combination of points from the arrays ‘x’ and ‘y’, then you can do this:

f(*np.meshgrid(x, y))

Say, if your function just produces the product of two elements, then this is how a cartesian product can be achieved, efficiently for large arrays.

Referred from here


回答 5

基本思想

给定可能的x值xs(将其视为图的x轴上的刻度线)和可能的y值ys,将meshgrid生成对应的(x,y)网格点集-与类似set((x, y) for x in xs for y in yx)。例如,如果xs=[1,2,3]ys=[4,5,6],我们将获得一组坐标{(1,4), (2,4), (3,4), (1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}

返回值的形式

但是,meshgrid返回的表示形式在两个方面与上述表达式不同:

首先meshgrid在2d数组中布置网格点:行对应于不同的y值,列对应于不同的x值-如在中list(list((x, y) for x in xs) for y in ys),将得到以下数组:

   [[(1,4), (2,4), (3,4)],
    [(1,5), (2,5), (3,5)],
    [(1,6), (2,6), (3,6)]]

其次,分别meshgrid返回x和y坐标(即在两个不同的numpy 2d数组中):

   xcoords, ycoords = (
       array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]]),
       array([[4, 4, 4],
              [5, 5, 5],
              [6, 6, 6]]))
   # same thing using np.meshgrid:
   xcoords, ycoords = np.meshgrid([1,2,3], [4,5,6])
   # same thing without meshgrid:
   xcoords = np.array([xs] * len(ys)
   ycoords = np.array([ys] * len(xs)).T

注意,np.meshgrid也可以生成更大尺寸的网格。给定xs,ys和zs,您将把xcoords,ycoords,zcoords作为3d数组返回。meshgrid还支持维度的反向排序以及结果的稀疏表示。

应用领域

我们为什么要这种形式的输出?

在网格上的每个点上应用一个函数: 一种动机是像(+,-,*,/,**)这样的二进制运算符对于numpy数组作为元素操作进行了重载。这意味着,如果我有一个def f(x, y): return (x - y) ** 2可以在两个标量上使用的函数,那么我也可以将其应用于两个numpy数组以获取按元素排列的结果数组:例如f(xcoords, ycoords)f(*np.meshgrid(xs, ys))在上面的示例中给出以下内容:

array([[ 9,  4,  1],
       [16,  9,  4],
       [25, 16,  9]])

高维外部产品:我不确定这有多有效,但是您可以通过以下方式获得高维外部产品:np.prod(np.meshgrid([1,2,3], [1,2], [1,2,3,4]), axis=0)

在matplotlib云图:我遇到meshgrid调查时绘制等高线图与matplotlib用于绘制的决策边界。为此,使用生成一个网格meshgrid,在每个网格点评估函数(例如,如上所示),然后将xcoords,ycoords和计算出的f值(即zcoords)传递给contourf函数。

Basic Idea

Given possible x values, xs, (think of them as the tick-marks on the x-axis of a plot) and possible y values, ys, meshgrid generates the corresponding set of (x, y) grid points—analogous to set((x, y) for x in xs for y in yx). For example, if xs=[1,2,3] and ys=[4,5,6], we’d get the set of coordinates {(1,4), (2,4), (3,4), (1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}.

Form of the Return Value

However, the representation that meshgrid returns is different from the above expression in two ways:

First, meshgrid lays out the grid points in a 2d array: rows correspond to different y-values, columns correspond to different x-values—as in list(list((x, y) for x in xs) for y in ys), which would give the following array:

   [[(1,4), (2,4), (3,4)],
    [(1,5), (2,5), (3,5)],
    [(1,6), (2,6), (3,6)]]

Second, meshgrid returns the x and y coordinates separately (i.e. in two different numpy 2d arrays):

   xcoords, ycoords = (
       array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]]),
       array([[4, 4, 4],
              [5, 5, 5],
              [6, 6, 6]]))
   # same thing using np.meshgrid:
   xcoords, ycoords = np.meshgrid([1,2,3], [4,5,6])
   # same thing without meshgrid:
   xcoords = np.array([xs] * len(ys)
   ycoords = np.array([ys] * len(xs)).T

Note, np.meshgrid can also generate grids for higher dimensions. Given xs, ys, and zs, you’d get back xcoords, ycoords, zcoords as 3d arrays. meshgrid also supports reverse ordering of the dimensions as well as sparse representation of the result.

Applications

Why would we want this form of output?

Apply a function at every point on a grid: One motivation is that binary operators like (+, -, *, /, **) are overloaded for numpy arrays as elementwise operations. This means that if I have a function def f(x, y): return (x - y) ** 2 that works on two scalars, I can also apply it on two numpy arrays to get an array of elementwise results: e.g. f(xcoords, ycoords) or f(*np.meshgrid(xs, ys)) gives the following on the above example:

array([[ 9,  4,  1],
       [16,  9,  4],
       [25, 16,  9]])

Higher dimensional outer product: I’m not sure how efficient this is, but you can get high-dimensional outer products this way: np.prod(np.meshgrid([1,2,3], [1,2], [1,2,3,4]), axis=0).

Contour plots in matplotlib: I came across meshgrid when investigating drawing contour plots with matplotlib for plotting decision boundaries. For this, you generate a grid with meshgrid, evaluate the function at each grid point (e.g. as shown above), and then pass the xcoords, ycoords, and computed f-values (i.e. zcoords) into the contourf function.


numpy中的flatten和ravel函数有什么区别?

问题:numpy中的flatten和ravel函数有什么区别?

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

这两个函数返回相同的列表。那么需要两个不同的功能来执行相同的工作。

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

Both function return the same list. Then what is the need of two different functions performing same job.


回答 0

当前的API是:

  • flatten 总是返回一个副本。
  • ravel尽可能返回原始数组的视图。这在打印输出中不可见,但是如果您修改ravel返回的数组,则可能会修改原始数组中的条目。如果您修改从flatten返回的数组中的条目,则将永远不会发生。ravel通常会更快,因为没有内存被复制,但是您在修改返回的数组时要格外小心。
  • reshape((-1,)) 只要数组的步幅允许,就可以得到一个视图,即使这意味着您并不总是可以获得连续的数组。

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn’t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don’t always get a contiguous array.

回答 1

如此所述,关键区别在于:

  • flatten 是ndarray对象的方法,因此只能用于真正的numpy数组。

  • ravel 是库级别的函数,因此可以在任何可以成功解析的对象上调用。

例如,ravel将对ndarray列表起作用,flatten而不适用于该类型的对象。

@IanH还在回答中指出了与内存处理的重要区别。

As explained here a key difference is that:

  • flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

  • ravel is a library-level function and hence can be called on any object that can successfully be parsed.

For example ravel will work on a list of ndarrays, while flatten is not available for that type of object.

@IanH also points out important differences with memory handling in his answer.


回答 2

这是函数的正确命名空间:

这两个函数均返回指向新存储器结构的展平一维数组。

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

在上例中:

  • 结果的存储位置不同,
  • 结果看起来一样
  • 展平将返回副本
  • ravel将返回一个视图。

我们如何检查某物是否是副本?使用的.base属性ndarray。如果是视图,则基础将是原始数组;如果是副本,则基数为None

Here is the correct namespace for the functions:

Both functions return flattened 1D arrays pointing to the new memory structures.

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

In the upper example:

  • the memory locations of the results are different,
  • the results look the same
  • flatten would return a copy
  • ravel would return a view.

How we check if something is a copy? Using the .base attribute of the ndarray. If it’s a view, the base will be the original array; if it is a copy, the base will be None.