


test = numpy.array([[1, 2], [3, 4], [5, 6]])

test[i]使我得到数组的第i行(例如[1, 2])。如何访问第ith列?(例如[1, 3, 5])。另外,这将是一项昂贵的操作吗?

Suppose I have:

test = numpy.array([[1, 2], [3, 4], [5, 6]])

test[i] gets me ith line of the array (eg [1, 2]). How can I access the ith column? (eg [1, 3, 5]). Also, would this be an expensive operation?

回答 0

>>> test[:,0]
array([1, 3, 5])


>>> test[1,:]
array([3, 4])


>>> test[:,0]
array([1, 3, 5])


>>> test[1,:]
array([3, 4])

lets you access rows. This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It’s certainly much quicker than accessing each element in a loop.

回答 1


>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

And if you want to access more than one column at a time you could do:

>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

回答 2

>>> test[:,0]
array([1, 3, 5])


ValueError: all the input arrays must have same number of dimensions

>>> test[:,[0]]



>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])
>>> test[:,0]
array([1, 3, 5])

this command gives you a row vector, if you just want to loop over it, it’s fine, but if you want to hstack with some other array with dimension 3xN, you will have

ValueError: all the input arrays must have same number of dimensions


>>> test[:,[0]]

gives you a column vector, so that you can do concatenate or hstack operation.


>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])

回答 3


In [4]: test.T[0]
Out[4]: array([1, 3, 5])

You could also transpose and return a row:

In [4]: test.T[0]
Out[4]: array([1, 3, 5])

回答 4


> test[:,[0,2]]


To get several and indepent columns, just:

> test[:,[0,2]]

you will get colums 0 and 2

回答 5



arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])


arr_c1_ref = arr[:, 1]  # creates a reference to the 1st column of the arr
arr_c1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr


arr_c1_ref.base is arr  # True
arr_c1_copy.base is arr  # False



arr_c1_ref.strides[0]  # 8 bytes
arr_c1_copy.strides[0]  # 4 bytes


A = np.random.randint(2, size=(10000,10000), dtype='int32')
A_c1_ref = A[:, 1] 
A_c1_copy = A[:, 1].copy()


%timeit A_c1_ref.sum()  # ~248 µs
%timeit A_c1_copy.sum()  # ~12.8 µs


A_c1_ref.strides[0]  # 40000 bytes
A_c1_copy.strides[0]  # 4 bytes

尽管使用列副本似乎更好,但由于创建副本需要时间并使用更多的内存(在这种情况下,我花了大约200 µs的时间来创建副本)并不总是正确的。 A_c1_copy)。但是,如果我们首先需要复制,或者需要在数组的特定列上执行许多不同的操作,并且可以牺牲内存以提高速度,那么复制就可以了。

如果我们有兴趣主要使用列,最好以列大(’F’)顺序而不是行大(’C’)顺序创建数组(这是默认设置) ),然后像以前一样进行切片以获取一列而不复制它:

A = np.asfortranarray(A)  # or np.array(A, order='F')
A_c1_ref = A[:, 1]
A_c1_ref.strides[0]  # 4 bytes
%timeit A_c1_ref.sum()  # ~12.6 µs vs ~248 µs



A.T[1,:].strides[0]  # 40000

Although the question has been answered, let me mention some nuances.

Let’s say you are interested in the first column of the array

arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])

As you already know from other answers, to get it in the form of “row vector” (array of shape (3,)), you use slicing:

arr_c1_ref = arr[:, 1]  # creates a reference to the 1st column of the arr
arr_c1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr

To check if an array is a view or a copy of another array you can do the following:

arr_c1_ref.base is arr  # True
arr_c1_copy.base is arr  # False

see ndarray.base.

Besides the obvious difference between the two (modifying arr_c1_ref will affect arr), the number of byte-steps for traversing each of them is different:

arr_c1_ref.strides[0]  # 8 bytes
arr_c1_copy.strides[0]  # 4 bytes

see strides. Why is this important? Imagine that you have a very big array A instead of the arr:

A = np.random.randint(2, size=(10000,10000), dtype='int32')
A_c1_ref = A[:, 1] 
A_c1_copy = A[:, 1].copy()

and you want to compute the sum of all the elements of the first column, i.e. A_c1_ref.sum() or A_c1_copy.sum(). Using the copied version is much faster:

%timeit A_c1_ref.sum()  # ~248 µs
%timeit A_c1_copy.sum()  # ~12.8 µs

This is due to the different number of strides mentioned before:

A_c1_ref.strides[0]  # 40000 bytes
A_c1_copy.strides[0]  # 4 bytes

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time and uses more memory (in this case it took me approx. 200 µs to create the A_c1_copy). However if we need the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.

In the case that we are interested in working mostly with columns, it could be a good idea to create our array in column-major (‘F’) order instead of the row-major (‘C’) order (which is the default), and then do the slicing as before to get a column without copying it:

A = np.asfortranarray(A)  # or np.array(A, order='F')
A_c1_ref = A[:, 1]
A_c1_ref.strides[0]  # 4 bytes
%timeit A_c1_ref.sum()  # ~12.6 µs vs ~248 µs

Now, performing the sum operation (or any other) on a column-view is much faster.

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.

A.T[1,:].strides[0]  # 40000

回答 6

>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol


>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])
>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol

Then you can select the 2nd – 4th column this way:

>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])