问题:从ND到一维阵列
说我有一个数组a
:
a = np.array([[1,2,3], [4,5,6]])
array([[1, 2, 3],
[4, 5, 6]])
我想将其转换为一维数组(即列向量):
b = np.reshape(a, (1,np.product(a.shape)))
但这又回来了
array([[1, 2, 3, 4, 5, 6]])
这与以下内容不同:
array([1, 2, 3, 4, 5, 6])
我可以使用此数组的第一个元素将其手动转换为一维数组:
b = np.reshape(a, (1,np.product(a.shape)))[0]
但这需要我知道原始数组有多少个维数(并在使用较大维数时将[0]连接起来)
有没有一种与尺寸无关的方式来从任意ndarray获取列/行向量?
Say I have an array a
:
a = np.array([[1,2,3], [4,5,6]])
array([[1, 2, 3],
[4, 5, 6]])
I would like to convert it to a 1D array (i.e. a column vector):
b = np.reshape(a, (1,np.product(a.shape)))
but this returns
array([[1, 2, 3, 4, 5, 6]])
which is not the same as:
array([1, 2, 3, 4, 5, 6])
I can take the first element of this array to manually convert it to a 1D array:
b = np.reshape(a, (1,np.product(a.shape)))[0]
but this requires me to know how many dimensions the original array has (and concatenate [0]’s when working with higher dimensions)
Is there a dimensions-independent way of getting a column/row vector from an arbitrary ndarray?
回答 0
使用np.ravel(用于1D视图)或np.ndarray.flatten(用于1D副本)或np.ndarray.flat(用于1D迭代器):
In [12]: a = np.array([[1,2,3], [4,5,6]])
In [13]: b = a.ravel()
In [14]: b
Out[14]: array([1, 2, 3, 4, 5, 6])
请注意,ravel()
返回view
的a
时候可能。因此修改b
也会修改a
。ravel()
返回一个view
当1D元件在存储器中连续,但将返回copy
,如果,例如,a
是从使用非单元的步长(例如切片另一个阵列制成a = x[::2]
)。
如果要复制而不是视图,请使用
In [15]: c = a.flatten()
如果只需要迭代器,请使用np.ndarray.flat
:
In [20]: d = a.flat
In [21]: d
Out[21]: <numpy.flatiter object at 0x8ec2068>
In [22]: list(d)
Out[22]: [1, 2, 3, 4, 5, 6]
Use np.ravel (for a 1D view) or np.ndarray.flatten (for a 1D copy) or np.ndarray.flat (for an 1D iterator):
In [12]: a = np.array([[1,2,3], [4,5,6]])
In [13]: b = a.ravel()
In [14]: b
Out[14]: array([1, 2, 3, 4, 5, 6])
Note that ravel()
returns a view
of a
when possible. So modifying b
also modifies a
. ravel()
returns a view
when the 1D elements are contiguous in memory, but would return a copy
if, for example, a
were made from slicing another array using a non-unit step size (e.g. a = x[::2]
).
If you want a copy rather than a view, use
In [15]: c = a.flatten()
If you just want an iterator, use np.ndarray.flat
:
In [20]: d = a.flat
In [21]: d
Out[21]: <numpy.flatiter object at 0x8ec2068>
In [22]: list(d)
Out[22]: [1, 2, 3, 4, 5, 6]
回答 1
In [14]: b = np.reshape(a, (np.product(a.shape),))
In [15]: b
Out[15]: array([1, 2, 3, 4, 5, 6])
或者,简单地:
In [16]: a.flatten()
Out[16]: array([1, 2, 3, 4, 5, 6])
In [14]: b = np.reshape(a, (np.product(a.shape),))
In [15]: b
Out[15]: array([1, 2, 3, 4, 5, 6])
or, simply:
In [16]: a.flatten()
Out[16]: array([1, 2, 3, 4, 5, 6])
回答 2
最简单的方法之一是使用flatten()
,例如以下示例:
import numpy as np
batch_y =train_output.iloc[sample, :]
batch_y = np.array(batch_y).flatten()
我的数组是这样的:
0
0 6
1 6
2 5
3 4
4 3
.
.
.
使用后flatten()
:
array([6, 6, 5, ..., 5, 3, 6])
这也是此类错误的解决方案:
Cannot feed value of shape (100, 1) for Tensor 'input/Y:0', which has shape '(?,)'
One of the simplest way is to use flatten()
, like this example :
import numpy as np
batch_y =train_output.iloc[sample, :]
batch_y = np.array(batch_y).flatten()
My array it was like this :
0
0 6
1 6
2 5
3 4
4 3
.
.
.
After using flatten()
:
array([6, 6, 5, ..., 5, 3, 6])
It’s also the solution of errors of this type :
Cannot feed value of shape (100, 1) for Tensor 'input/Y:0', which has shape '(?,)'
回答 3
对于具有不同大小的数组列表,请使用以下命令:
import numpy as np
# ND array list with different size
a = [[1],[2,3,4,5],[6,7,8]]
# stack them
b = np.hstack(a)
print(b)
输出:
[1 2 3 4 5 6 7 8]
For list of array with different size use following:
import numpy as np
# ND array list with different size
a = [[1],[2,3,4,5],[6,7,8]]
# stack them
b = np.hstack(a)
print(b)
Output:
[1 2 3 4 5 6 7 8]
回答 4
我想查看答案中提到的功能(包括unutbu的功能)的基准结果。
还需要指出的是,建议在案例视图中使用numpy docarr.reshape(-1)
更为可取。(即使ravel
在以下结果中速度更快)
TL; DR:np.ravel
是性能最高的(数量很少)。
基准测试
功能:
numpy版本:“ 1.18.0”
不同ndarray
大小的执行时间
+-------------+----------+-----------+-----------+-------------+
| function | 10x10 | 100x100 | 1000x1000 | 10000x10000 |
+-------------+----------+-----------+-----------+-------------+
| ravel | 0.002073 | 0.002123 | 0.002153 | 0.002077 |
| reshape(-1) | 0.002612 | 0.002635 | 0.002674 | 0.002701 |
| flatten | 0.000810 | 0.007467 | 0.587538 | 107.321913 |
| flat | 0.000337 | 0.000255 | 0.000227 | 0.000216 |
+-------------+----------+-----------+-----------+-------------+
结论
ravel
并且reshape(-1)
的执行时间是一致的,并且与ndarray的大小无关。但是,ravel
速度稍快一些,但是reshape
在调整大小时提供了灵活性。(也许这就是为什么numpy doc建议改用它的原因。或者在某些情况下reshape
返回视图而ravel
没有)。
如果要处理大尺寸的ndarray,则使用flatten
可能会导致性能问题。建议不要使用它。除非您需要数据副本才能执行其他操作。
使用的代码
import timeit
setup = '''
import numpy as np
nd = np.random.randint(10, size=(10, 10))
'''
timeit.timeit('nd = np.reshape(nd, -1)', setup=setup, number=1000)
timeit.timeit('nd = np.ravel(nd)', setup=setup, number=1000)
timeit.timeit('nd = nd.flatten()', setup=setup, number=1000)
timeit.timeit('nd.flat', setup=setup, number=1000)
I wanted to see a benchmark result of functions mentioned in answers including unutbu’s.
Also want to point out that numpy doc recommend to use arr.reshape(-1)
in case view is preferable. (even though ravel
is tad faster in the following result)
TL;DR: np.ravel
is the most performant (by very small amount).
Benchmark
Functions:
numpy version: ‘1.18.0’
Execution times on different ndarray
sizes
+-------------+----------+-----------+-----------+-------------+
| function | 10x10 | 100x100 | 1000x1000 | 10000x10000 |
+-------------+----------+-----------+-----------+-------------+
| ravel | 0.002073 | 0.002123 | 0.002153 | 0.002077 |
| reshape(-1) | 0.002612 | 0.002635 | 0.002674 | 0.002701 |
| flatten | 0.000810 | 0.007467 | 0.587538 | 107.321913 |
| flat | 0.000337 | 0.000255 | 0.000227 | 0.000216 |
+-------------+----------+-----------+-----------+-------------+
Conclusion
ravel
and reshape(-1)
‘s execution time was consistent and independent from ndarray size.
However, ravel
is tad faster, but reshape
provides flexibility in reshaping size. (maybe that’s why numpy doc recommend to use it instead. Or there could be some cases where reshape
returns view and ravel
doesn’t).
If you are dealing with large size ndarray, using flatten
can cause a performance issue. Recommend not to use it. Unless you need a copy of the data to do something else.
Used code
import timeit
setup = '''
import numpy as np
nd = np.random.randint(10, size=(10, 10))
'''
timeit.timeit('nd = np.reshape(nd, -1)', setup=setup, number=1000)
timeit.timeit('nd = np.ravel(nd)', setup=setup, number=1000)
timeit.timeit('nd = nd.flatten()', setup=setup, number=1000)
timeit.timeit('nd.flat', setup=setup, number=1000)
回答 5
尽管这不是使用np数组格式,(为了懒惰地修改我的代码),这应该做您想要的…如果,您确实想要一个列向量,那么您将需要转置向量结果。这完全取决于您打算如何使用它。
def getVector(data_array,col):
vector = []
imax = len(data_array)
for i in range(imax):
vector.append(data_array[i][col])
return ( vector )
a = ([1,2,3], [4,5,6])
b = getVector(a,1)
print(b)
Out>[2,5]
因此,如果需要转置,可以执行以下操作:
def transposeArray(data_array):
# need to test if this is a 1D array
# can't do a len(data_array[0]) if it's 1D
two_d = True
if isinstance(data_array[0], list):
dimx = len(data_array[0])
else:
dimx = 1
two_d = False
dimy = len(data_array)
# init output transposed array
data_array_t = [[0 for row in range(dimx)] for col in range(dimy)]
# fill output transposed array
for i in range(dimx):
for j in range(dimy):
if two_d:
data_array_t[j][i] = data_array[i][j]
else:
data_array_t[j][i] = data_array[j]
return data_array_t
Although this isn’t using the np array format, (to lazy to modify my code) this should do what you want… If, you truly want a column vector you will want to transpose the vector result. It all depends on how you are planning to use this.
def getVector(data_array,col):
vector = []
imax = len(data_array)
for i in range(imax):
vector.append(data_array[i][col])
return ( vector )
a = ([1,2,3], [4,5,6])
b = getVector(a,1)
print(b)
Out>[2,5]
So if you need to transpose, you can do something like this:
def transposeArray(data_array):
# need to test if this is a 1D array
# can't do a len(data_array[0]) if it's 1D
two_d = True
if isinstance(data_array[0], list):
dimx = len(data_array[0])
else:
dimx = 1
two_d = False
dimy = len(data_array)
# init output transposed array
data_array_t = [[0 for row in range(dimx)] for col in range(dimy)]
# fill output transposed array
for i in range(dimx):
for j in range(dimy):
if two_d:
data_array_t[j][i] = data_array[i][j]
else:
data_array_t[j][i] = data_array[j]
return data_array_t
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。