标签归档:arrays

在numpy中将一维数组转换为二维数组

问题:在numpy中将一维数组转换为二维数组

我想通过指定2D数组中的列数将一维数组转换为二维数组。可能会像这样工作:

> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
       [3, 4],
       [5, 6]])

numpy是否具有类似于我的组合函数“ vec2matrix”的功能?(我知道您可以像2D数组一样索引1D数组,但这不是我拥有的代码中的选项-我需要进行此转换。)

I want to convert a 1-dimensional array into a 2-dimensional array by specifying the number of columns in the 2D array. Something that would work like this:

> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
       [3, 4],
       [5, 6]])

Does numpy have a function that works like my made-up function “vec2matrix”? (I understand that you can index a 1D array like a 2D array, but that isn’t an option in the code I have – I need to make this conversion.)


回答 0

您要reshape阵列。

B = np.reshape(A, (-1, 2))

其中-1,从输入数组的大小推断出新维度的大小。

You want to reshape the array.

B = np.reshape(A, (-1, 2))

where -1 infers the size of the new dimension from the size of the input array.


回答 1

您有两种选择:

  • 如果您不再想要原始形状,最简单的方法就是为数组分配一个新形状

    a.shape = (a.size//ncols, ncols)

    您可以切换a.size//ncols通过-1自动计算合适的形状。确保a.shape[0]*a.shape[1]=a.size,否则会遇到一些问题。

  • 您可以使用np.reshape函数获得一个新的数组,该函数的工作原理与上述版本相似

    new = np.reshape(a, (-1, ncols))

    如果可能,new将仅是初始array的视图a,这意味着数据是共享的。但是,在某些情况下,new数组将被复制。请注意,np.reshape还接受一个可选关键字order,该关键字使您可以从行优先C顺序切换到列优先Fortran顺序。np.reshape是该a.reshape方法的函数版本。

如果您不能满足要求a.shape[0]*a.shape[1]=a.size,则必须创建一个新数组。您可以使用该np.resize函数并将其与混合使用np.reshape,例如

>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)

You have two options:

  • If you no longer want the original shape, the easiest is just to assign a new shape to the array

    a.shape = (a.size//ncols, ncols)
    

    You can switch the a.size//ncols by -1 to compute the proper shape automatically. Make sure that a.shape[0]*a.shape[1]=a.size, else you’ll run into some problem.

  • You can get a new array with the np.reshape function, that works mostly like the version presented above

    new = np.reshape(a, (-1, ncols))
    

    When it’s possible, new will be just a view of the initial array a, meaning that the data are shared. In some cases, though, new array will be acopy instead. Note that np.reshape also accepts an optional keyword order that lets you switch from row-major C order to column-major Fortran order. np.reshape is the function version of the a.reshape method.

If you can’t respect the requirement a.shape[0]*a.shape[1]=a.size, you’re stuck with having to create a new array. You can use the np.resize function and mixing it with np.reshape, such as

>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)

回答 2

尝试类似的方法:

B = np.reshape(A,(-1,ncols))

您需要确保可以将数组中的元素数除以ncols。您也可以B使用order关键字按照将数字拉入的顺序进行游戏。

Try something like:

B = np.reshape(A,(-1,ncols))

You’ll need to make sure that you can divide the number of elements in your array by ncols though. You can also play with the order in which the numbers are pulled into B using the order keyword.


回答 3

如果您的唯一目的是将1d数组X转换为2d数组,请执行以下操作:

X = np.reshape(X,(1, X.size))

If your sole purpose is to convert a 1d array X to a 2d array just do:

X = np.reshape(X,(1, X.size))

回答 4

import numpy as np
array = np.arange(8) 
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)
import numpy as np
array = np.arange(8) 
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)

回答 5

some_array.shape = (1,)+some_array.shape

或换一个新的

another_array = numpy.reshape(some_array, (1,)+some_array.shape)

这将使尺寸+1,等于在最外层添加一个括号

some_array.shape = (1,)+some_array.shape

or get a new one

another_array = numpy.reshape(some_array, (1,)+some_array.shape)

This will make dimensions +1, equals to adding a bracket on the outermost


回答 6

您可以flatten()从numpy包中使用。

import numpy as np
a = np.array([[1, 2],
       [3, 4],
       [5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")

输出:

original array: [[1 2]
 [3 4]
 [5 6]] 
flattened array = [1 2 3 4 5 6]

You can useflatten() from the numpy package.

import numpy as np
a = np.array([[1, 2],
       [3, 4],
       [5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")

Output:

original array: [[1 2]
 [3 4]
 [5 6]] 
flattened array = [1 2 3 4 5 6]

回答 7

不使用Numpy将一维数组更改为二维数组。

l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part


while end <= len(l):
    temp = []
    for i in range(start, end):
        temp.append(l[i])
    new.append(temp)
    start += part
    end += part
print("new values:  ", new)


# for uneven cases
temp = []
while start < len(l):
    temp.append(l[start])
    start += 1
    new.append(temp)
print("new values for uneven cases:   ", new)

Change 1D array into 2D array without using Numpy.

l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part


while end <= len(l):
    temp = []
    for i in range(start, end):
        temp.append(l[i])
    new.append(temp)
    start += part
    end += part
print("new values:  ", new)


# for uneven cases
temp = []
while start < len(l):
    temp.append(l[start])
    start += 1
    new.append(temp)
print("new values for uneven cases:   ", new)

从NumPy数组中选择特定的行和列

问题:从NumPy数组中选择特定的行和列

我一直在发疯,试图找出我在这里做错了什么愚蠢的事情。

我正在使用NumPy,并且我想从中选择特定的行索引和特定的列索引。这是我的问题的要点:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

为什么会这样呢?我当然应该能够选择第一行,第二行和第四行以及第一列和第三列?我期望的结果是:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

I’ve been going crazy trying to figure out what stupid thing I’m doing wrong here.

I’m using NumPy, and I have specific row indices and specific column indices that I want to select from. Here’s the gist of my problem:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I’m expecting is:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

回答 0

花式索引要求您提供每个维度的所有索引。您为第一个提供3个索引,为第二个仅提供2个索引,因此会出现错误。您想做这样的事情:

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

当然写这很痛苦,所以您可以让广播帮助您:

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

如果您使用数组而不是列表建立索引,则此操作要简单得多:

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Fancy indexing requires you to provide all indices for each dimension. You are providing 3 indices for the first one, and only 2 for the second one, hence the error. You want to do something like this:

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

That is of course a pain to write, so you can let broadcasting help you:

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

This is much simpler to do if you index with arrays, not lists:

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

回答 1

至于全胜表明,一个简单的黑客是只选择第一行,然后选择在列

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[编辑]内置方法: np.ix_

最近,我发现numpy为您提供了内置的一线功能,可以准确执行@Jaime的建议,而不必使用广播语法(由于缺乏可读性)。从文档:

使用ix_可以快速构建索引数组,该索引数组将对叉积进行索引。a[np.ix_([1,3],[2,5])]返回数组[[a[1,2] a[1,5]], [a[3,2] a[3,5]]]

因此,您可以这样使用它:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

而且它的工作方式是像Jaime所建议的那样照顾数组的对齐,以便正确进行广播:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

而且,正如MikeC在评论中说的那样,它np.ix_具有返回视图的优点,而我的第一个(预编辑)答案没有。这意味着您现在可以分配给索引数组:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that.

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[Edit] The built-in method: np.ix_

I recently discovered that numpy gives you an in-built one-liner to doing exactly what @Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability). From the docs:

Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

So you use it like this:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not. This means you can now assign to the indexed array:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])

回答 2

使用:

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

要么:

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

USE:

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

OR:

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

回答 3

使用np.ix_是最方便的方法(有人回答),但这是另一种有趣的方法:

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> a[rows].T[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Using np.ix_ is the most convenient way to do it (as answered by others), but here is another interesting way to do it:

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> a[rows].T[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

这是什么意思?

问题:这是什么意思?

我正在分析一些Python代码,但我不知道

pop = population[:]

手段。是Java中的数组列表还是二维数组?

I’m analyzing some Python code and I don’t know what

pop = population[:]

means. Is it something like array lists in Java or like a bi-dimensional array?


回答 0

它是切片符号的示例,其作用取决于的类型population。如果population是列表,则此行将创建列表的浅表副本。对于类型为tuple或的对象str,它将不执行任何操作(如果不使用,该行将执行相同操作[:]),对于(例如)NumPy数组,它将为相同的数据创建一个新视图。

It is an example of slice notation, and what it does depends on the type of population. If population is a list, this line will create a shallow copy of the list. For an object of type tuple or a str, it will do nothing (the line will do the same without [:]), and for a (say) NumPy array, it will create a new view to the same data.


回答 1

了解列表切片通常会复制一部分列表,这也可能会有所帮助。例如,population[2:4]将返回一个包含人口[2]和人口[3]的列表(切片是右排它的)。保留左和右索引,因为population[:]它们分别默认为0和length(population),从而选择了整个列表。因此,这是制作列表副本的常见习语。

It might also help to know that a list slice in general makes a copy of part of the list. E.g. population[2:4] will return a list containing population[2] and population[3] (slicing is right-exclusive). Leaving away the left and right index, as in population[:] they default to 0 and length(population) respectively, thereby selecting the entire list. Hence this is a common idiom to make a copy of a list.


回答 2

好吧…这确实取决于上下文。最终,它通过一个slice对象(slice(None,None,None))为以下方法之一: __getitem____setitem____delitem__。(实际上,如果对象具有__getslice__,将代替__getitem__,但现在已弃用,不应使用)。

对象可以对切片执行其所需的操作。

在以下情况下:

x = obj[:]

这将obj.__getitem__使用传入的slice对象进行调用。实际上,这完全等效于:

x = obj[slice(None,None,None)]

(尽管前者可能更有效,因为它不必查找 slice构造函数-全部用字节码完成)。

对于大多数对象,这是一种创建序列的一部分的浅表副本的方法。

下一个:

x[:] = obj

是一种设置项目的方法(它调用 __setitem__基于)的方法obj

而且,我想您可能会猜出什么:

del x[:]

调用;-)。

您还可以传递不同的切片:

x[1:4]

结构体 slice(1,4,None)

x[::-1]

构造slice(None,None,-1)等等。进一步阅读: 解释Python的切片符号

well… this really depends on the context. Ultimately, it passes a slice object (slice(None,None,None)) to one of the following methods: __getitem__, __setitem__ or __delitem__. (Actually, if the object has a __getslice__, that will be used instead of __getitem__, but that is now deprecated and shouldn’t be used).

Objects can do what they want with the slice.

In the context of:

x = obj[:]

This will call obj.__getitem__ with the slice object passed in. In fact, this is completely equivalent to:

x = obj[slice(None,None,None)]

(although the former is probably more efficient because it doesn’t have to look up the slice constructor — It’s all done in bytecode).

For most objects, this is a way to create a shallow copy of a portion of the sequence.

Next:

x[:] = obj

Is a way to set the items (it calls __setitem__) based on obj.

and, I think you can probably guess what:

del x[:]

calls ;-).

You can also pass different slices:

x[1:4]

constructs slice(1,4,None)

x[::-1]

constructs slice(None,None,-1) and so forth. Further reading: Explain Python’s slice notation


回答 3

这是一片从序列开始到结束的,通常会产生浅拷贝。

(嗯,不仅限于此,但您无需关心。)

It is a slice from the beginning of the sequence to the end, usually producing a shallow copy.

(Well, it’s more than that, but you don’t need to care yet.)


回答 4

它创建列表的副本,而不仅仅是为现有列表分配新名称。

It creates a copy of the list, versus just assigning a new name for the already existing list.


回答 5

[:]
用于限制器或数组中的切片,散列,
例如:
[1:5]用于显示1到5之间的值,即1-4
[start:end]

主要用于数组中的切片,理解方括号接受表示的变量要显示的值或键,“:”用于将整个数组限制或分割为数据包。

[:]
used for limiter or slicing in array , hash
eg:
[1:5] for displaying values between 1 inclusive and 5 exclusive i.e 1-4
[start:end]

basically used in array for slicing , understand bracket accept variable that mean value or key to display, and ” : ” is used to limit or slice the entire array into packets .


列表到数组的转换以使用ravel()函数

问题:列表到数组的转换以使用ravel()函数

我在python中有一个列表,我想将其转换为数组以能够使用ravel()函数。

I have a list in python and I want to convert it to an array to be able to use ravel() function.


回答 0

用途numpy.asarray

import numpy as np
myarray = np.asarray(mylist)

Use numpy.asarray:

import numpy as np
myarray = np.asarray(mylist)

回答 1

创建一个int数组和一个列表

from array import array
listA = list(range(0,50))
for item in listA:
    print(item)
arrayA = array("i", listA)
for item in arrayA:
    print(item)

create an int array and a list

from array import array
listA = list(range(0,50))
for item in listA:
    print(item)
arrayA = array("i", listA)
for item in arrayA:
    print(item)

回答 2

我想要一种无需使用额外模块即可执行此操作的方法。首先将列表转换为字符串,然后追加到数组:

dataset_list = ''.join(input_list)
dataset_array = []
for item in dataset_list.split(';'): # comma, or other
    dataset_array.append(item)

I wanted a way to do this without using an extra module. First turn list to string, then append to an array:

dataset_list = ''.join(input_list)
dataset_array = []
for item in dataset_list.split(';'): # comma, or other
    dataset_array.append(item)

回答 3

如果您只想ravel在自己的(嵌套,我要摆姿势?)列表上打电话,则可以直接执行此操作,numpy将为您进行转换:

L = [[1,None,3],["The", "quick", object]]
np.ravel(L)
# array([1, None, 3, 'The', 'quick', <class 'object'>], dtype=object)

另外值得一提的是,你不必去通过numpy所有

If all you want is calling ravel on your (nested, I s’pose?) list, you can do that directly, numpy will do the casting for you:

L = [[1,None,3],["The", "quick", object]]
np.ravel(L)
# array([1, None, 3, 'The', 'quick', <class 'object'>], dtype=object)

Also worth mentioning that you needn’t go through numpy at all.


回答 4

使用以下代码:

import numpy as np

myArray=np.array([1,2,4])  #func used to convert [1,2,3] list into an array
print(myArray)

Use the following code:

import numpy as np

myArray=np.array([1,2,4])  #func used to convert [1,2,3] list into an array
print(myArray)

回答 5

如果变量b有一个列表,则只需执行以下操作:

创建一个新变量“ a”为:a=[] 然后将列表分配给“ a”为:a=b

现在“ a”在数组中具有列表“ b”的所有组件。

因此您已成功将列表转换为数组。

if variable b has a list then you can simply do the below:

create a new variable “a” as: a=[] then assign the list to “a” as: a=b

now “a” has all the components of list “b” in array.

so you have successfully converted list to array.


将2D数组复制到3维,N次(Python)

问题:将2D数组复制到3维,N次(Python)

我想将一个numpy的2D数组复制到第三维。例如,给定(2D)numpy数组:

import numpy as np
arr = np.array([[1,2],[1,2]])
# arr.shape = (2, 2)

将其转换为3D矩阵,并在一个新维度中包含N个此类副本。作用于arr与N = 3时,输出应为:

new_arr = np.array([[[1,2],[1,2]],[[1,2],[1,2]],[[1,2],[1,2]]])
# new_arr.shape = (3, 2, 2)

I’d like to copy a numpy 2D array into a third dimension. For example, given the (2D) numpy array:

import numpy as np
arr = np.array([[1,2],[1,2]])
# arr.shape = (2, 2)

convert it into a 3D matrix with N such copies in a new dimension. Acting on arr with N=3, the output should be:

new_arr = np.array([[[1,2],[1,2]],[[1,2],[1,2]],[[1,2],[1,2]]])
# new_arr.shape = (3, 2, 2)

回答 0

也许最干净的方法是使用np.repeat

a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2,  2)

# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)

print(b.shape)
# (2, 2, 3)

print(b[:, :, 0])
# [[1 2]
#  [1 2]]

print(b[:, :, 1])
# [[1 2]
#  [1 2]]

print(b[:, :, 2])
# [[1 2]
#  [1 2]]

话虽如此,您通常可以通过使用broadcast避免完全重复阵列。例如,假设我要添加一个(3,)向量:

c = np.array([1, 2, 3])

a。我可以a在第三维中复制3次的内容,然后c在第一维和第二维中复制两次的内容,这样我的两个数组都是(2, 2, 3),然后计算它们的总和。但是,这样做更加简单快捷:

d = a[..., None] + c[None, None, :]

在此,a[..., None]具有形状,(2, 2, 1)并且c[None, None, :]具有形状(1, 1, 3)*。当我计算总和时,结果沿大小为1的维度“广播”出去,给了我shape的结果(2, 2, 3)

print(d.shape)
# (2,  2, 3)

print(d[..., 0])    # a + c[0]
# [[2 3]
#  [2 3]]

print(d[..., 1])    # a + c[1]
# [[3 4]
#  [3 4]]

print(d[..., 2])    # a + c[2]
# [[4 5]
#  [4 5]]

广播是一项非常强大的技术,因为它避免了在内存中创建输入数组的重复副本所涉及的额外开销。


*尽管为清楚起见,我将它们包括在内,但实际上并不需要None索引c-您也可以这样做a[..., None] + c,即(2, 2, 1)针对(3,)数组广播数组。这是因为,如果其中一个数组的尺寸小于另一个数组的尺寸,则仅两个数组的尾随尺寸需要兼容。举一个更复杂的例子:

a = np.ones((6, 1, 4, 3, 1))  # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2))     #     5 x 1 x 3 x 2
result = a + b                # 6 x 5 x 4 x 3 x 2

Probably the cleanest way is to use np.repeat:

a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2,  2)

# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)

print(b.shape)
# (2, 2, 3)

print(b[:, :, 0])
# [[1 2]
#  [1 2]]

print(b[:, :, 1])
# [[1 2]
#  [1 2]]

print(b[:, :, 2])
# [[1 2]
#  [1 2]]

Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let’s say I wanted to add a (3,) vector:

c = np.array([1, 2, 3])

to a. I could copy the contents of a 3 times in the third dimension, then copy the contents of c twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3), then compute their sum. However, it’s much simpler and quicker to do this:

d = a[..., None] + c[None, None, :]

Here, a[..., None] has shape (2, 2, 1) and c[None, None, :] has shape (1, 1, 3)*. When I compute the sum, the result gets ‘broadcast’ out along the dimensions of size 1, giving me a result of shape (2, 2, 3):

print(d.shape)
# (2,  2, 3)

print(d[..., 0])    # a + c[0]
# [[2 3]
#  [2 3]]

print(d[..., 1])    # a + c[1]
# [[3 4]
#  [3 4]]

print(d[..., 2])    # a + c[2]
# [[4 5]
#  [4 5]]

Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.


* Although I included them for clarity, the None indices into c aren’t actually necessary – you could also do a[..., None] + c, i.e. broadcast a (2, 2, 1) array against a (3,) array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:

a = np.ones((6, 1, 4, 3, 1))  # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2))     #     5 x 1 x 3 x 2
result = a + b                # 6 x 5 x 4 x 3 x 2

回答 1

另一种方法是使用numpy.dstack。假设您要重复矩阵a num_repeats时间:

import numpy as np
b = np.dstack([a]*num_repeats)

诀窍是将矩阵包装a到单个元素的列表中,然后使用*运算符在此列表中重复元素num_repeats

例如,如果:

a = np.array([[1, 2], [1, 2]])
num_repeats = 5

这将[1 2; 1 2]在第三维中重复5次该数组。验证(在IPython中):

In [110]: import numpy as np

In [111]: num_repeats = 5

In [112]: a = np.array([[1, 2], [1, 2]])

In [113]: b = np.dstack([a]*num_repeats)

In [114]: b[:,:,0]
Out[114]: 
array([[1, 2],
       [1, 2]])

In [115]: b[:,:,1]
Out[115]: 
array([[1, 2],
       [1, 2]])

In [116]: b[:,:,2]
Out[116]: 
array([[1, 2],
       [1, 2]])

In [117]: b[:,:,3]
Out[117]: 
array([[1, 2],
       [1, 2]])

In [118]: b[:,:,4]
Out[118]: 
array([[1, 2],
       [1, 2]])

In [119]: b.shape
Out[119]: (2, 2, 5)

最后,我们可以看到矩阵的形状为2 x 2,在第三维中有5个切片。

Another way is to use numpy.dstack. Supposing that you want to repeat the matrix a num_repeats times:

import numpy as np
b = np.dstack([a]*num_repeats)

The trick is to wrap the matrix a into a list of a single element, then using the * operator to duplicate the elements in this list num_repeats times.

For example, if:

a = np.array([[1, 2], [1, 2]])
num_repeats = 5

This repeats the array of [1 2; 1 2] 5 times in the third dimension. To verify (in IPython):

In [110]: import numpy as np

In [111]: num_repeats = 5

In [112]: a = np.array([[1, 2], [1, 2]])

In [113]: b = np.dstack([a]*num_repeats)

In [114]: b[:,:,0]
Out[114]: 
array([[1, 2],
       [1, 2]])

In [115]: b[:,:,1]
Out[115]: 
array([[1, 2],
       [1, 2]])

In [116]: b[:,:,2]
Out[116]: 
array([[1, 2],
       [1, 2]])

In [117]: b[:,:,3]
Out[117]: 
array([[1, 2],
       [1, 2]])

In [118]: b[:,:,4]
Out[118]: 
array([[1, 2],
       [1, 2]])

In [119]: b.shape
Out[119]: (2, 2, 5)

At the end we can see that the shape of the matrix is 2 x 2, with 5 slices in the third dimension.


回答 2

使用视图并获得免费的运行时!将通用n-dim数组扩展为n+1-dim

NumPy中1.10.0引入后,我们可以利用它numpy.broadcast_to来简单地生成输入数组的3D视图2D。好处将是没有额外的内存开销和几乎免费的运行时。这在数组很大且我们可以使用视图的情况下至关重要。同样,这将适用于一般n-dim情况。

我会用单词stack代替copy,因为读者可能会将它与创建内存副本的数组的复制混淆。

沿第一轴堆叠

如果我们要arr沿第一个轴堆叠输入,np.broadcast_to创建3D视图的解决方案将是-

np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here

沿第三个/最后一个轴堆叠

arr沿第三轴堆叠输入,创建3D视图的解决方案是-

np.broadcast_to(arr[...,None],arr.shape+(3,))

如果我们确实需要一个内存副本,那么我们总是可以在此追加.copy()。因此,解决方案将是-

np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()

这是两种情况下堆叠的工作方式,并显示了样品箱的形状信息-

# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)

# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)

# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)

相同的解决方案可以用来扩展n-dim输入以n+1-dim沿第一个轴和最后一个轴查看输出。让我们探讨一些更暗淡的情况-

3D输入盒:

In [58]: arr = np.random.rand(4,5,6)

# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)

# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)

4D输入盒:

In [61]: arr = np.random.rand(4,5,6,7)

# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)

# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)

等等。

时机

让我们使用一个大样本示例2D,获取时间并验证输出是否为view

# Sample input array
In [19]: arr = np.random.rand(1000,1000)

让我们证明所提出的解决方案确实是一种观点。我们将使用沿第一轴的堆叠(沿第三轴的堆叠结果非常相似)-

In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True

让我们来说明一下它实际上是免费的-

In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop

In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop

N从观点来看,在时序上从3增加到3000不变,并且在时序单位上两者都可以忽略不计。因此,在内存和性能上都非常有效!

Use a view and get free runtime! Extend generic n-dim arrays to n+1-dim

Introduced in NumPy 1.10.0, we can leverage numpy.broadcast_to to simply generate a 3D view into the 2D input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim cases.

I would use the word stack in place of copy, as readers might confuse it with the copying of arrays that creates memory copies.

Stack along first axis

If we want to stack input arr along the first axis, the solution with np.broadcast_to to create 3D view would be –

np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here

Stack along third/last axis

To stack input arr along the third axis, the solution to create 3D view would be –

np.broadcast_to(arr[...,None],arr.shape+(3,))

If we actually need a memory copy, we can always append .copy() there. Hence, the solutions would be –

np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()

Here’s how the stacking works for the two cases, shown with their shape information for a sample case –

# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)

# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)

# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)

Same solution(s) would work to extend a n-dim input to n+1-dim view output along the first and last axes. Let’s explore some higher dim cases –

3D input case :

In [58]: arr = np.random.rand(4,5,6)

# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)

# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)

4D input case :

In [61]: arr = np.random.rand(4,5,6,7)

# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)

# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)

and so on.

Timings

Let’s use a large sample 2D case and get the timings and verify output being a view.

# Sample input array
In [19]: arr = np.random.rand(1000,1000)

Let’s prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) –

In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True

Let’s get the timings to show that it’s virtually free –

In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop

In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop

Being a view, increasing N from 3 to 3000 changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!


回答 3

A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)

编辑@ Mr.F,以保留尺寸顺序:

B=B.T
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)

Edit @Mr.F, to preserve dimension order:

B=B.T

回答 4

这是一个广播示例,可以完全满足您的要求。

a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]

然后b*a是所希望的结果和(b*a)[:,:,0]产生array([[1, 2],[1, 2]]),这是原来a一样,(b*a)[:,:,1]

Here’s a broadcasting example that does exactly what was requested.

a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]

Then b*a is the desired result and (b*a)[:,:,0] produces array([[1, 2],[1, 2]]), which is the original a, as does (b*a)[:,:,1], etc.


回答 5

现在也可以使用np.tile如下实现:

import numpy as np

a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))

b.shape
(3,2,2)

b
array([[[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]]])

This can now also be achived using np.tile as follows:

import numpy as np

a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))

b.shape
(3,2,2)

b
array([[[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]]])

如何将字符串散列为8位数字?

问题:如何将字符串散列为8位数字?

无论如何,我可以将随机字符串散列为8位数字而无需自己实现任何算法吗?

Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?


回答 0

是的,您可以使用内置的hashlib模块或内置的hash函数。然后,对整数形式的哈希使用模运算或字符串切片运算来切掉最后八位数字:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

Yes, you can use the built-in hashlib modules or the built-in hash function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

回答 1

Raymond的答案对于python2非常有用(尽管您不需要abs()或10 ** 8附近的括号)。但是,对于python3,有一些重要的警告。首先,您需要确保传递编码的字符串。如今,在大多数情况下,最好避开sha-1并改用sha-256之类的东西。因此,hashlib方法将是:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

如果要改用hash()函数,则需要注意的重要一点是,与Python 2.x不同,在Python 3.x中,hash()的结果将仅在进程内保持一致,而在整个python调用之间均保持一致。看这里:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

这意味着建议使用基于hash()的解决方案,可以将其简化为:

hash(s) % 10**8

将仅在给定的脚本运行中返回相同的值:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

因此,取决于这是否在您的应用程序中很重要(它在我的应用程序中确实如此),您可能希望坚持使用基于hashlib的方法。

Raymond’s answer is great for python2 (though, you don’t need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you’ll need to make sure you are passing an encoded string. These days, in most circumstances, it’s probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

This means the hash()-based solution suggested, which can be shortened to just:

hash(s) % 10**8

will only return the same value within a given script run:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

So, depending on if this matters in your application (it did in mine), you’ll probably want to stick to the hashlib-based approach.


回答 2

只是为了完成JJC答案,在python 3.5.3中,如果您通过以下方式使用hashlib,则行为是正确的:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

回答 3

我将分享由@Raymond Hettinger实施的解决方案的nodejs实施。

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

带副本的numpy数组分配

问题:带副本的numpy数组分配

例如,如果我们有一个numpyarray A,我们想要一个numpy数组B具有相同元素。

以下(见下文)方法之间的区别是什么?什么时候分配额外的内存,什么时候不分配?

  1. B = A
  2. B[:] = A(与B[:]=A[:]?相同)
  3. numpy.copy(B, A)

For example, if we have a numpy array A, and we want a numpy array B with the same elements.

What is the difference between the following (see below) methods? When is additional memory allocated, and when is it not?

  1. B = A
  2. B[:] = A (same as B[:]=A[:]?)
  3. numpy.copy(B, A)

回答 0

这三个版本都做不同的事情:

  1. B = A

    这会将新名称绑定B到已经命名的现有对象A。之后,它们引用同一个对象,因此,如果您就地修改一个对象,那么您也会在另一个对象中看到更改。

  2. B[:] = A(与B[:]=A[:]?相同)

    这会将值从中复制A到现有数组中B。两个数组必须具有相同的形状才能起作用。B[:] = A[:]做同样的事情(但B = A[:]会做更多类似1的事情)。

  3. numpy.copy(B, A)

    这不是合法的语法。你可能是说B = numpy.copy(A)。这几乎与2相同,但是它创建了一个新数组,而不是重用该B数组。如果没有其他对先前B值的引用,则最终结果将与2相同,但是在复制期间它将临时使用更多内存。

    也许您是说numpy.copyto(B, A),这是合法的,等于2?

All three versions do different things:

  1. B = A

    This binds a new name B to the existing object already named A. Afterwards they refer to the same object, so if you modify one in place, you’ll see the change through the other one too.

  2. B[:] = A (same as B[:]=A[:]?)

    This copies the values from A into an existing array B. The two arrays must have the same shape for this to work. B[:] = A[:] does the same thing (but B = A[:] would do something more like 1).

  3. numpy.copy(B, A)

    This is not legal syntax. You probably meant B = numpy.copy(A). This is almost the same as 2, but it creates a new array, rather than reusing the B array. If there were no other references to the previous B value, the end result would be the same as 2, but it will use more memory temporarily during the copy.

    Or maybe you meant numpy.copyto(B, A), which is legal, and is equivalent to 2?


回答 1

  1. B=A 创建参考
  2. B[:]=A 复制
  3. numpy.copy(B,A) 复制

后两个需要额外的内存。

要制作深拷贝,您需要使用 B = copy.deepcopy(A)

  1. B=A creates a reference
  2. B[:]=A makes a copy
  3. numpy.copy(B,A) makes a copy

the last two need additional memory.

To make a deep copy you need to use B = copy.deepcopy(A)


回答 2

这是我唯一的工作答案:

B=numpy.array(A)

This is the only working answer for me:

B=numpy.array(A)

如何正确保存和加载numpy.array()数据?

问题:如何正确保存和加载numpy.array()数据?

我想知道如何numpy.array正确保存和加载数据。目前,我正在使用该numpy.savetxt()方法。例如,如果我有一个array markers,它看起来像这样:

我尝试通过使用以下方式保存它:

numpy.savetxt('markers.txt', markers)

在其他脚本中,我尝试打开以前保存的文件:

markers = np.fromfile("markers.txt")

这就是我得到的…

首先保存的数据如下所示:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

但是,当我使用相同的方法保存刚刚加载的数据时,即 numpy.savetxt()它看起来像这样:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

我究竟做错了什么?PS没有执行其他“后台”操作。只需保存和加载,这就是我得到的。先感谢您。

I wonder, how to save and load numpy.array data properly. Currently I’m using the numpy.savetxt() method. For example, if I got an array markers, which looks like this:

I try to save it by the use of:

numpy.savetxt('markers.txt', markers)

In other script I try to open previously saved file:

markers = np.fromfile("markers.txt")

And that’s what I get…

Saved data first looks like this:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

But when I save just loaded data by the use of the same method, ie. numpy.savetxt() it looks like this:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

What am I doing wrong? PS there are no other “backstage” operation which I perform. Just saving and loading, and that’s what I get. Thank you in advance.


回答 0

我发现执行此操作的最可靠方法是与一起使用np.savetxtnp.loadtxt而不是np.fromfile更适合用编写的二进制文件tofile。该np.fromfilenp.tofile方法写入和读取二进制文件,而np.savetxt写入一个文本文件。因此,例如:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

要么:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

我使用前一种方法,即使它速度较慢并且有时会创建更大的文件:二进制格式也可能取决于平台(例如,文件格式取决于系统的字节序)。

NumPy数组有与平台无关的格式,可以使用np.save和保存和读取np.load

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

The most reliable way I have found to do this is to use np.savetxt with np.loadtxt and not np.fromfile which is better suited to binary files written with tofile. The np.fromfile and np.tofile methods write and read binary files whereas np.savetxt writes a text file. So, for example:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

Or:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

I use the former method even if it is slower and creates bigger files (sometimes): the binary format can be platform dependent (for example, the file format depends on the endianness of your system).

There is a platform independent format for NumPy arrays, which can be saved and read with np.save and np.load:

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

回答 1

np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load
np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load

回答 2

np.fromfile()有一个sep=关键字参数:

如果文件是文本文件,则项目之间的分隔符。空(“”)分隔符表示文件应被视为二进制文件。分隔符中的空格(“”)匹配零个或多个空格字符。仅由空格组成的分隔符必须至少匹配一个空格。

默认值sep=""意味着np.fromfile()试图将其读取为二进制文件而不是以空格分隔的文本文件,因此您会得到无意义的值。如果使用np.fromfile('markers.txt', sep=" "),将得到您想要的结果。

但是,正如其他人指出的那样,这np.loadtxt()是将文本文件转换为numpy数组的首选方法,除非该文件需要人类可读,否则通常最好使用二进制格式(例如np.load()/ np.save())。

np.fromfile() has a sep= keyword argument:

Separator between items if file is a text file. Empty (“”) separator means the file should be treated as binary. Spaces (” ”) in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace.

The default value of sep="" means that np.fromfile() tries to read it as a binary file rather than a space-separated text file, so you get nonsense values back. If you use np.fromfile('markers.txt', sep=" ") you will get the result you are looking for.

However, as others have pointed out, np.loadtxt() is the preferred way to convert text files to numpy arrays, and unless the file needs to be human-readable it is usually better to use binary formats instead (e.g. np.load()/np.save()).


回答 3

对于简短的答案,您应该使用np.savenp.load。这些方法的优点是它们是由numpy库的开发人员制作的,并且已经可以工作(加上可能已经很好地进行了优化),例如

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

扩展答案:

最后,它确实取决于您的需求,因为您还可以将其保存为人类可读的格式(请参见将此NumPy数组转储到csv文件中),或者如果文件非常大,甚至可以与其他库一起使用(请参见保存numpy数组的最佳方法)在磁盘上进行扩展讨论)。

但是,(由于您在问题中使用“正确”一词,因此进行了扩展)我仍然认为开箱即用(和大多数代码!)使用numpy函数最有可能满足大多数用户需求。最重要的原因是它已经起作用。出于其他原因尝试使用其他东西可能会使您出乎意料的长兔子洞,弄清楚为什么它不起作用并迫使它起作用。

以尝试用泡菜保存为例。我只是为了好玩而尝试,花了至少30分钟的时间才意识到,除非我用字节模式打开并读取文件,否则泡菜不会保存我的东西wb。花时间去Google,试一试,理解错误消息等。小细节,但事实是它已经需要我打开文件,从而以意想不到的方式使事情变得复杂。补充一点,它要求我重新阅读此内容(哪个btw有点令人困惑)内置开放功能中的模式a,a +,w,w +和r +之间的区别?

所以,如果有符合您需要使用它,除非你有一个(的接口非常)充分的理由(如与MATLAB或由于某种原因,你真的要读取的文件和打印Python真的不能满足您的需求,它的兼容性可能有问题)。此外,最有可能的是,如果您需要对其进行优化,则可以在以后找到答案(而不是花很多时间调试无用的东西,例如打开一个简单的numpy文件)。

因此,请使用interface / numpy提供。它可能并不完美,这很可能很好,尤其是对于已经存在numpy的库而言。

我已经花了很多时间用numpy来保存和加载数据,所以请乐在其中,希望对您有所帮助!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

关于我学到的一些评论:

  • np.save如预期的那样,它已经很好地进行了压缩(请参阅https://stackoverflow.com/a/55750128/1601580),开箱即用,无需打开任何文件。清洁。简单。高效。用它。
  • np.savez使用未压缩的格式(请参阅docsSave several arrays into a single file in uncompressed 。npz format.如果决定使用此格式(警告您不要使用标准解决方案,因此请注意错误!),您可能会发现您需要使用参数名称来保存它,除非您想要使用默认名称。因此,如果第一个已经使用(或任何作品都使用该功能!),请勿使用此功能。
  • Pickle还允许执行任意代码。出于安全原因,某些人可能不想使用此功能。
  • 可读文件的制作成本很高,可能不值得。
  • 有一些所谓hdf5的大文件。凉!https://stackoverflow.com/a/9619713/1601580

请注意,这不是详尽的答案。但是对于其他资源,请检查以下内容:

For a short answer you should use np.save and np.load. The advantages of these is that they are made by developers of the numpy library and they already work (plus are likely already optimized nicely) e.g.

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

Expanded answer:

In the end it really depends in your needs because you can also save it human readable format (see this Dump a NumPy array into a csv file) or even with other libraries if your files are extremely large (see this best way to preserve numpy arrays on disk for an expanded discussion).

However, (making an expansion since you use the word “properly” in your question) I still think using the numpy function out of the box (and most code!) most likely satisfy most user needs. The most important reason is that it already works. Trying to use something else for any other reason might take you on an unexpectedly LONG rabbit hole to figure out why it doesn’t work and force it work.

Take for example trying to save it with pickle. I tried that just for fun and it took me at least 30 minutes to realize that pickle wouldn’t save my stuff unless I opened & read the file in bytes mode with wb. Took time to google, try thing, understand the error message etc… Small detail but the fact that it already required me to open a file complicated things in unexpected ways. To add that it required me to re-read this (which btw is sort of confusing) Difference between modes a, a+, w, w+, and r+ in built-in open function?.

So if there is an interface that meets your needs use it unless you have a (very) good reason (e.g. compatibility with matlab or for some reason your really want to read the file and printing in python really doesn’t meet your needs, which might be questionable). Furthermore, most likely if you need to optimize it you’ll find out later down the line (rather than spend ages debugging useless stuff like opening a simple numpy file).

So use the interface/numpy provide. It might not be perfect it’s most likely fine, especially for a library that’s been around as long as numpy.

I already spent the saving and loading data with numpy in a bunch of way so have fun with it, hope it helps!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

Some comments on what I learned:

  • np.save as expected, this already compresses it well (see https://stackoverflow.com/a/55750128/1601580), works out of the box without any file opening. Clean. Easy. Efficient. Use it.
  • np.savez uses a uncompressed format (see docs) Save several arrays into a single file in uncompressed .npz format. If you decide to use this (you were warned to go away from the standard solution so expect bugs!) you might discover that you need to use argument names to save it, unless you want to use the default names. So don’t use this if the first already works (or any works use that!)
  • Pickle also allows for arbitrary code execution. Some people might not want to use this for security reasons.
  • human readable files are expensive to make etc. Probably not worth it.
  • there is something called hdf5 for large files. Cool! https://stackoverflow.com/a/9619713/1601580

Note this is not an exhaustive answer. But for other resources check this:


输入和输出numpy数组到h5py

问题:输入和输出numpy数组到h5py

我有一个Python代码,其输出为大小矩阵,其条目均为type float。如果使用扩展名保存,.dat则文件大小约为500 MB。我读到使用h5py会大大减少文件大小。因此,假设我有一个名为的2D numpy数组A。如何将其保存到h5py文件?另外,由于需要对数组进行操作,如何读取相同文件并将其作为numpy数组放入不同的代码中?

I have a Python code whose output is a sized matrix, whose entries are all of the type float. If I save it with the extension .dat the file size is of the order of 500 MB. I read that using h5py reduces the file size considerably. So, let’s say I have the 2D numpy array named A. How do I save it to an h5py file? Also, how do I read the same file and put it as a numpy array in a different code, as I need to do manipulations with the array?


回答 0

h5py提供了一个数据集模型。前者基本上是数组,而后者可以视为目录。每个都被命名。您应该查看API的文档和示例:

http://docs.h5py.org/en/latest/quick.html

一个简单的示例,其中您要先创建所有数据,然后只想将其保存到hdf5文件,如下所示:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

然后,您可以使用以下命令将数据加载回:

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

绝对看看文档:

http://docs.h5py.org

写入hdf5文件取决于h5py或pytables(每个文件都具有位于hdf5文件规范之上的不同python API)。您还应该看看numpy本机提供的其他简单二进制格式,例如np.savenp.savez等等:

http://docs.scipy.org/doc/numpy/reference/routines.io.html

h5py provides a model of datasets and groups. The former is basically arrays and the latter you can think of as directories. Each is named. You should look at the documentation for the API and examples:

http://docs.h5py.org/en/latest/quick.html

A simple example where you are creating all of the data upfront and just want to save it to an hdf5 file would look something like:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

You can then load that data back in using: ‘

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

Definitely check out the docs:

http://docs.h5py.org

Writing to hdf5 file depends either on h5py or pytables (each has a different python API that sits on top of the hdf5 file specification). You should also take a look at other simple binary formats provided by numpy natively such as np.save, np.savez etc:

http://docs.scipy.org/doc/numpy/reference/routines.io.html


回答 1

处理文件打开/关闭并避免内存泄漏的更干净的方法

准备:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

写:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

读:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

A cleaner way to handle file open/close and avoid memory leaks:

Prep:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

Write:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

Read:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

如何用零除返回0

问题:如何用零除返回0

我正在尝试在python中执行元素明智的除法,但是如果遇到零,我需要将商设为零。

例如:

array1 = np.array([0, 1, 2])
array2 = np.array([0, 1, 1])

array1 / array2 # should be np.array([0, 1, 2])

我总是可以在数据中使用for循环,但是要真正利用numpy的优化,我需要除法函数在除以零错误后返回0,而不是忽略错误。

除非我缺少任何东西,否则numpy.seterr()似乎不会在出现错误时返回值。在设置自己的除以零的错误处理方法时,还有人对我如何从numpy中获得最大收益有其他建议吗?

I’m trying to perform an element wise divide in python, but if a zero is encountered, I need the quotient to just be zero.

For example:

array1 = np.array([0, 1, 2])
array2 = np.array([0, 1, 1])

array1 / array2 # should be np.array([0, 1, 2])

I could always just use a for-loop through my data, but to really utilize numpy’s optimizations, I need the divide function to return 0 upon divide by zero errors instead of ignoring the error.

Unless I’m missing something, it doesn’t seem numpy.seterr() can return values upon errors. Does anyone have any other suggestions on how I could get the best out of numpy while setting my own divide by zero error handling?


回答 0

在numpy v1.7 +中,您可以利用ufuncs的“ where”选项。您可以一行完成事情,而不必与错误上下文管理器打交道。

>>> a = np.array([-1, 0, 1, 2, 3], dtype=float)
>>> b = np.array([ 0, 0, 0, 2, 2], dtype=float)

# If you don't pass `out` the indices where (b == 0) will be uninitialized!
>>> c = np.divide(a, b, out=np.zeros_like(a), where=b!=0)
>>> print(c)
[ 0.   0.   0.   1.   1.5]

在这种情况下,它将在“其中” b不等于零的任何地方进行除法计算。当b等于零时,它与您在’out’参数中最初给它的任何值保持不变。

In numpy v1.7+, you can take advantage of the “where” option for ufuncs. You can do things in one line and you don’t have to deal with the errstate context manager.

>>> a = np.array([-1, 0, 1, 2, 3], dtype=float)
>>> b = np.array([ 0, 0, 0, 2, 2], dtype=float)

# If you don't pass `out` the indices where (b == 0) will be uninitialized!
>>> c = np.divide(a, b, out=np.zeros_like(a), where=b!=0)
>>> print(c)
[ 0.   0.   0.   1.   1.5]

In this case, it does the divide calculation anywhere ‘where’ b does not equal zero. When b does equal zero, then it remains unchanged from whatever value you originally gave it in the ‘out’ argument.


回答 1

以@Franck Dernoncourt的答案为基础,修正-1 / 0:

def div0( a, b ):
    """ ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0  # -inf inf NaN
    return c

div0( [-1, 0, 1], 0 )
array([0, 0, 0])

Building on @Franck Dernoncourt’s answer, fixing -1 / 0:

def div0( a, b ):
    """ ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0  # -inf inf NaN
    return c

div0( [-1, 0, 1], 0 )
array([0, 0, 0])

回答 2

以其他答案为基础,并在以下方面进行改进:

码:

import numpy as np

a = np.array([0,0,1,1,2], dtype='float')
b = np.array([0,1,0,1,3], dtype='float')

with np.errstate(divide='ignore', invalid='ignore'):
    c = np.true_divide(a,b)
    c[c == np.inf] = 0
    c = np.nan_to_num(c)

print('c: {0}'.format(c))

输出:

c: [ 0.          0.          0.          1.          0.66666667]

Building on the other answers, and improving on:

Code:

import numpy as np

a = np.array([0,0,1,1,2], dtype='float')
b = np.array([0,1,0,1,3], dtype='float')

with np.errstate(divide='ignore', invalid='ignore'):
    c = np.true_divide(a,b)
    c[c == np.inf] = 0
    c = np.nan_to_num(c)

print('c: {0}'.format(c))

Output:

c: [ 0.          0.          0.          1.          0.66666667]

回答 3

单线(引发警告)

np.nan_to_num(array1 / array2)

One-liner (throws warning)

np.nan_to_num(array1 / array2)

回答 4

尝试分两个步骤进行。先划分,然后更换。

with numpy.errstate(divide='ignore'):
    result = numerator / denominator
    result[denominator == 0] = 0

numpy.errstate行是可选的,并且仅防止numpy告诉您除零的“错误”,因为您已经打算这样做并处理这种情况。

Try doing it in two steps. Division first, then replace.

with numpy.errstate(divide='ignore'):
    result = numerator / denominator
    result[denominator == 0] = 0

The numpy.errstate line is optional, and just prevents numpy from telling you about the “error” of dividing by zero, since you’re already intending to do so, and handling that case.


回答 5

您也可以inf仅根据数组dtypes为float来基于进行替换,如下所示

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> c = a / b
>>> c
array([ inf,   2.,   1.])
>>> c[c == np.inf] = 0
>>> c
array([ 0.,  2.,  1.])

You can also replace based on inf, only if the array dtypes are floats, as per this answer:

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> c = a / b
>>> c
array([ inf,   2.,   1.])
>>> c[c == np.inf] = 0
>>> c
array([ 0.,  2.,  1.])

回答 6

我发现搜索一个相关问题的一个答案是根据分母是否为零来操纵输出。

假设arrayAarrayB已经初始化,但是arrayB有一些零。如果我们要arrayC = arrayA / arrayB安全地进行计算,可以执行以下操作。

在这种情况下,只要我在其中一个单元格中myOwnValue被零除,就将单元格设置为等于,在这种情况下为零

myOwnValue = 0
arrayC = np.zeros(arrayA.shape())
indNonZeros = np.where(arrayB != 0)
indZeros = np.where(arrayB = 0)

# division in two steps: first with nonzero cells, and then zero cells
arrayC[indNonZeros] = arrayA[indNonZeros] / arrayB[indNonZeros]
arrayC[indZeros] = myOwnValue # Look at footnote

脚注:回想起来,这条线无论如何都是不必要的,因为它arrayC[i]被实例化为零。但是,如果是这种情况myOwnValue != 0,该操作将有所作为。

One answer I found searching a related question was to manipulate the output based upon whether the denominator was zero or not.

Suppose arrayA and arrayB have been initialized, but arrayB has some zeros. We could do the following if we want to compute arrayC = arrayA / arrayB safely.

In this case, whenever I have a divide by zero in one of the cells, I set the cell to be equal to myOwnValue, which in this case would be zero

myOwnValue = 0
arrayC = np.zeros(arrayA.shape())
indNonZeros = np.where(arrayB != 0)
indZeros = np.where(arrayB = 0)

# division in two steps: first with nonzero cells, and then zero cells
arrayC[indNonZeros] = arrayA[indNonZeros] / arrayB[indNonZeros]
arrayC[indZeros] = myOwnValue # Look at footnote

Footnote: In retrospect, this line is unnecessary anyways, since arrayC[i] is instantiated to zero. But if were the case that myOwnValue != 0, this operation would do something.


回答 7

另一个值得一提的解决方案:

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> b_inv = np.array([1/i if i!=0 else 0 for i in b])
>>> a*b_inv
array([0., 2., 1.])

An other solution worth mentioning :

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> b_inv = np.array([1/i if i!=0 else 0 for i in b])
>>> a*b_inv
array([0., 2., 1.])