问题:创建用NaN填充的Numpy矩阵
我有以下代码:
r = numpy.zeros(shape = (width, height, 9))
它创建一个width x height x 9
填充零的矩阵。相反,我想知道是否有一种函数或方法可以将它们初始化为NaN
s,而方法很简单。
I have the following code:
r = numpy.zeros(shape = (width, height, 9))
It creates a width x height x 9
matrix filled with zeros. Instead, I’d like to know if there’s a function or way to initialize them instead to NaN
s in an easy way.
回答 0
您很少需要在numpy中进行矢量操作循环。您可以创建一个未初始化的数组并立即分配给所有条目:
>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
我已经在a[:] = numpy.nan
这里和a.fill(numpy.nan)
Blaenk发布的时间安排了时间:
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop
时序显示优先选择ndarray.fill(..)
更快的替代方法。OTOH,我喜欢numpy的便捷实现,在该实现中您可以同时为整个slice分配值,代码的意图非常明确。
请注意,ndarray.fill
它是就地执行其操作,因此numpy.empty((3,3,)).fill(numpy.nan)
将改为return None
。
You rarely need loops for vector operations in numpy. You can create an uninitialized array and assign to all entries at once:
>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
I have timed the alternatives a[:] = numpy.nan
here and a.fill(numpy.nan)
as posted by Blaenk:
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop
The timings show a preference for ndarray.fill(..)
as the faster alternative. OTOH, I like numpy’s convenience implementation where you can assign values to whole slices at the time, the code’s intention is very clear.
Note that ndarray.fill
performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan)
will instead return None
.
回答 1
另一个选择是使用numpy.full
,NumPy 1.8+中可用的一个选项
a = np.full([height, width, 9], np.nan)
这非常灵活,您可以用任何其他所需的数字填充它。
Another option is to use numpy.full
, an option available in NumPy 1.8+
a = np.full([height, width, 9], np.nan)
This is pretty flexible and you can fill it with any other number that you want.
回答 2
我比较了建议的速度替代方案,发现对于足够大的向量/矩阵填充,除val * ones
和以外的所有替代方案array(n * [val])
都同样快。
复制剧情的代码:
import numpy
import perfplot
val = 42.0
def fill(n):
a = numpy.empty(n)
a.fill(val)
return a
def colon(n):
a = numpy.empty(n)
a[:] = val
return a
def full(n):
return numpy.full(n, val)
def ones_times(n):
return val * numpy.ones(n)
def list(n):
return numpy.array(n * [val])
perfplot.show(
setup=lambda n: n,
kernels=[fill, colon, full, ones_times, list],
n_range=[2 ** k for k in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
)
I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except val * ones
and array(n * [val])
are equally fast.
Code to reproduce the plot:
import numpy
import perfplot
val = 42.0
def fill(n):
a = numpy.empty(n)
a.fill(val)
return a
def colon(n):
a = numpy.empty(n)
a[:] = val
return a
def full(n):
return numpy.full(n, val)
def ones_times(n):
return val * numpy.ones(n)
def list(n):
return numpy.array(n * [val])
perfplot.show(
setup=lambda n: n,
kernels=[fill, colon, full, ones_times, list],
n_range=[2 ** k for k in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
)
回答 3
你熟悉numpy.nan
吗?
您可以创建自己的方法,例如:
def nans(shape, dtype=float):
a = numpy.empty(shape, dtype)
a.fill(numpy.nan)
return a
然后
nans([3,4])
将输出
array([[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN]])
我在邮件列表线程中找到了此代码。
Are you familiar with numpy.nan
?
You can create your own method such as:
def nans(shape, dtype=float):
a = numpy.empty(shape, dtype)
a.fill(numpy.nan)
return a
Then
nans([3,4])
would output
array([[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN]])
I found this code in a mailing list thread.
回答 4
如果您不立即调用.empty
或.full
方法,则始终可以使用乘法:
>>> np.nan * np.ones(shape=(3,2))
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
当然,它也可以与其他任何数值一起使用:
>>> 42 * np.ones(shape=(3,2))
array([[ 42, 42],
[ 42, 42],
[ 42, 42]])
但是@ u0b34a0f6ae 可接受的答案快了3倍(CPU周期,而不是记住numpy语法的大脑周期;):
$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
You can always use multiplication if you don’t immediately recall the .empty
or .full
methods:
>>> np.nan * np.ones(shape=(3,2))
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Of course it works with any other numerical value as well:
>>> 42 * np.ones(shape=(3,2))
array([[ 42, 42],
[ 42, 42],
[ 42, 42]])
But the @u0b34a0f6ae’s accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):
$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
回答 5
另一种选择是numpy.broadcast_to(val,n)
,无论大小如何,它都将在恒定时间内返回,并且也是最有效的内存使用方法(它返回重复元素的视图)。需要注意的是,返回值是只读的。
以下是使用与NicoSchlömer的答案相同的基准所建议的所有其他方法的性能的比较。
Another alternative is numpy.broadcast_to(val,n)
which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.
Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer’s answer.
回答 6
如前所述,numpy.empty()是必经之路。但是,对于对象,fill()可能并不能完全按照您的想象:
In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)
一种解决方法可以是例如:
In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:
In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)
One way around can be e.g.:
In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
回答 7
此处尚未提及的另一种可能性是使用NumPy tile:
a = numpy.tile(numpy.nan, (3, 3))
还给
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
我不知道速度比较。
Yet another possibility not yet mentioned here is to use NumPy tile:
a = numpy.tile(numpy.nan, (3, 3))
Also gives
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
I don’t know about speed comparison.
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。