Python 实用宝典

Question 1

I want to convert a 1-dimensional array into a 2-dimensional array by specifying the number of columns in the 2D array. Something that would work like this:

> import numpy as np
> A = np.array([1,2,3,4,5,6])
> B = vec2matrix(A,ncol=2)
> B
array([[1, 2],
       [3, 4],
       [5, 6]])

Does numpy have a function that works like my made-up function “vec2matrix”? (I understand that you can index a 1D array like a 2D array, but that isn’t an option in the code I have – I need to make this conversion.)

Question 2

You want to reshape the array.

B = np.reshape(A, (-1, 2))

where -1 infers the size of the new dimension from the size of the input array.

Question 3

You have two options:

If you no longer want the original shape, the easiest is just to assign a new shape to the array
```
a.shape = (a.size//ncols, ncols)
```
You can switch the a.size//ncols by -1 to compute the proper shape automatically. Make sure that a.shape[0]*a.shape[1]=a.size, else you’ll run into some problem.
You can get a new array with the np.reshape function, that works mostly like the version presented above
```
new = np.reshape(a, (-1, ncols))
```
When it’s possible, new will be just a view of the initial array a, meaning that the data are shared. In some cases, though, new array will be acopy instead. Note that np.reshape also accepts an optional keyword order that lets you switch from row-major C order to column-major Fortran order. np.reshape is the function version of the a.reshape method.

If you can’t respect the requirement a.shape[0]*a.shape[1]=a.size, you’re stuck with having to create a new array. You can use the np.resize function and mixing it with np.reshape, such as

>>> a =np.arange(9)
>>> np.resize(a, 10).reshape(5,2)

Question 4

Try something like:

B = np.reshape(A,(-1,ncols))

You’ll need to make sure that you can divide the number of elements in your array by ncols though. You can also play with the order in which the numbers are pulled into B using the order keyword.

Question 5

If your sole purpose is to convert a 1d array X to a 2d array just do:

X = np.reshape(X,(1, X.size))

Question 6

import numpy as np
array = np.arange(8) 
print("Original array : \n", array)
array = np.arange(8).reshape(2, 4)
print("New array : \n", array)

Question 7

some_array.shape = (1,)+some_array.shape

or get a new one

another_array = numpy.reshape(some_array, (1,)+some_array.shape)

This will make dimensions +1, equals to adding a bracket on the outermost

Question 8

You can useflatten() from the numpy package.

import numpy as np
a = np.array([[1, 2],
       [3, 4],
       [5, 6]])
a_flat = a.flatten()
print(f"original array: {a} \nflattened array = {a_flat}")

Output:

original array: [[1 2]
 [3 4]
 [5 6]] 
flattened array = [1 2 3 4 5 6]

Question 9

Change 1D array into 2D array without using Numpy.

l = [i for i in range(1,21)]
part = 3
new = []
start, end = 0, part


while end <= len(l):
    temp = []
    for i in range(start, end):
        temp.append(l[i])
    new.append(temp)
    start += part
    end += part
print("new values:  ", new)


# for uneven cases
temp = []
while start < len(l):
    temp.append(l[start])
    start += 1
    new.append(temp)
print("new values for uneven cases:   ", new)

Question 10

我一直在发疯，试图找出我在这里做错了什么愚蠢的事情。

我正在使用NumPy，并且我想从中选择特定的行索引和特定的列索引。这是我的问题的要点：

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

为什么会这样呢？我当然应该能够选择第一行，第二行和第四行以及第一列和第三列？我期望的结果是：

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

Question 11

I’ve been going crazy trying to figure out what stupid thing I’m doing wrong here.

I’m using NumPy, and I have specific row indices and specific column indices that I want to select from. Here’s the gist of my problem:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I’m expecting is:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

Question 12

花式索引要求您提供每个维度的所有索引。您为第一个提供3个索引，为第二个仅提供2个索引，因此会出现错误。您想做这样的事情：

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

当然写这很痛苦，所以您可以让广播帮助您：

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

如果您使用数组而不是列表建立索引，则此操作要简单得多：

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Question 13

Fancy indexing requires you to provide all indices for each dimension. You are providing 3 indices for the first one, and only 2 for the second one, hence the error. You want to do something like this:

>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

That is of course a pain to write, so you can let broadcasting help you:

>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

This is much simpler to do if you index with arrays, not lists:

>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Question 14

至于全胜表明，一个简单的黑客是只选择第一行，然后选择在列说。

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[编辑]内置方法： `np.ix_`

最近，我发现numpy为您提供了内置的一线功能，可以准确执行@Jaime的建议，而不必使用广播语法（由于缺乏可读性）。从文档：

使用ix_可以快速构建索引数组，该索引数组将对叉积进行索引。a[np.ix_([1,3],[2,5])]返回数组[[a[1,2] a[1,5]], [a[3,2] a[3,5]]]。

因此，您可以这样使用它：

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

而且它的工作方式是像Jaime所建议的那样照顾数组的对齐，以便正确进行广播：

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

而且，正如MikeC在评论中说的那样，它np.ix_具有返回视图的优点，而我的第一个（预编辑）答案没有。这意味着您现在可以分配给索引数组：

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])

Question 15

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that.

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[Edit] The built-in method: `np.ix_`

I recently discovered that numpy gives you an in-built one-liner to doing exactly what @Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability). From the docs:

Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

So you use it like this:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not. This means you can now assign to the indexed array:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])

Question 16

使用：

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

要么：

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

Question 17

USE:

 >>> a[[0,1,3]][:,[0,2]]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

OR:

>>> a[[0,1,3],::2]
array([[ 0,  2],
   [ 4,  6],
   [12, 14]])

Question 18

使用np.ix_是最方便的方法（有人回答），但这是另一种有趣的方法：

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> a[rows].T[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Question 19

Using np.ix_ is the most convenient way to do it (as answered by others), but here is another interesting way to do it:

>>> rows = [0, 1, 3]
>>> cols = [0, 2]

>>> a[rows].T[cols].T

array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

Question 20

I’m analyzing some Python code and I don’t know what

pop = population[:]

means. Is it something like array lists in Java or like a bi-dimensional array?

Question 21

It is an example of slice notation, and what it does depends on the type of population. If population is a list, this line will create a shallow copy of the list. For an object of type tuple or a str, it will do nothing (the line will do the same without [:]), and for a (say) NumPy array, it will create a new view to the same data.

Question 22

It might also help to know that a list slice in general makes a copy of part of the list. E.g. population[2:4] will return a list containing population[2] and population[3] (slicing is right-exclusive). Leaving away the left and right index, as in population[:] they default to 0 and length(population) respectively, thereby selecting the entire list. Hence this is a common idiom to make a copy of a list.

Question 23

well… this really depends on the context. Ultimately, it passes a slice object (slice(None,None,None)) to one of the following methods: __getitem__, __setitem__ or __delitem__. (Actually, if the object has a __getslice__, that will be used instead of __getitem__, but that is now deprecated and shouldn’t be used).

Objects can do what they want with the slice.

In the context of:

x = obj[:]

This will call obj.__getitem__ with the slice object passed in. In fact, this is completely equivalent to:

x = obj[slice(None,None,None)]

(although the former is probably more efficient because it doesn’t have to look up the slice constructor — It’s all done in bytecode).

For most objects, this is a way to create a shallow copy of a portion of the sequence.

x[:] = obj

Is a way to set the items (it calls __setitem__) based on obj.

and, I think you can probably guess what:

del x[:]

calls ;-).

You can also pass different slices:

x[1:4]

constructs slice(1,4,None)

x[::-1]

constructs slice(None,None,-1) and so forth. Further reading: Explain Python’s slice notation

Question 24

It is a slice from the beginning of the sequence to the end, usually producing a shallow copy.

(Well, it’s more than that, but you don’t need to care yet.)

Question 25

It creates a copy of the list, versus just assigning a new name for the already existing list.

Question 26

[:]
used for limiter or slicing in array , hash
eg:
[1:5] for displaying values between 1 inclusive and 5 exclusive i.e 1-4
[start:end]

basically used in array for slicing , understand bracket accept variable that mean value or key to display, and ” : ” is used to limit or slice the entire array into packets .

Question 27

I have a list in python and I want to convert it to an array to be able to use ravel() function.

Question 28

Use numpy.asarray:

import numpy as np
myarray = np.asarray(mylist)

Question 29

create an int array and a list

from array import array
listA = list(range(0,50))
for item in listA:
    print(item)
arrayA = array("i", listA)
for item in arrayA:
    print(item)

Question 30

I wanted a way to do this without using an extra module. First turn list to string, then append to an array:

dataset_list = ''.join(input_list)
dataset_array = []
for item in dataset_list.split(';'): # comma, or other
    dataset_array.append(item)

Question 31

If all you want is calling ravel on your (nested, I s’pose?) list, you can do that directly, numpy will do the casting for you:

L = [[1,None,3],["The", "quick", object]]
np.ravel(L)
# array([1, None, 3, 'The', 'quick', <class 'object'>], dtype=object)

Also worth mentioning that you needn’t go through numpy at all.

Question 32

Use the following code:

import numpy as np

myArray=np.array([1,2,4])  #func used to convert [1,2,3] list into an array
print(myArray)

Question 33

if variable b has a list then you can simply do the below:

create a new variable “a” as: a=[] then assign the list to “a” as: a=b

now “a” has all the components of list “b” in array.

so you have successfully converted list to array.

Question 34

I’d like to copy a numpy 2D array into a third dimension. For example, given the (2D) numpy array:

import numpy as np
arr = np.array([[1,2],[1,2]])
# arr.shape = (2, 2)

convert it into a 3D matrix with N such copies in a new dimension. Acting on arr with N=3, the output should be:

new_arr = np.array([[[1,2],[1,2]],[[1,2],[1,2]],[[1,2],[1,2]]])
# new_arr.shape = (3, 2, 2)

Question 35

Probably the cleanest way is to use np.repeat:

a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2,  2)

# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)

print(b.shape)
# (2, 2, 3)

print(b[:, :, 0])
# [[1 2]
#  [1 2]]

print(b[:, :, 1])
# [[1 2]
#  [1 2]]

print(b[:, :, 2])
# [[1 2]
#  [1 2]]

Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let’s say I wanted to add a (3,) vector:

c = np.array([1, 2, 3])

to a. I could copy the contents of a 3 times in the third dimension, then copy the contents of c twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3), then compute their sum. However, it’s much simpler and quicker to do this:

d = a[..., None] + c[None, None, :]

Here, a[..., None] has shape (2, 2, 1) and c[None, None, :] has shape (1, 1, 3)*. When I compute the sum, the result gets ‘broadcast’ out along the dimensions of size 1, giving me a result of shape (2, 2, 3):

print(d.shape)
# (2,  2, 3)

print(d[..., 0])    # a + c[0]
# [[2 3]
#  [2 3]]

print(d[..., 1])    # a + c[1]
# [[3 4]
#  [3 4]]

print(d[..., 2])    # a + c[2]
# [[4 5]
#  [4 5]]

Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.

* Although I included them for clarity, the None indices into c aren’t actually necessary – you could also do a[..., None] + c, i.e. broadcast a (2, 2, 1) array against a (3,) array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:

a = np.ones((6, 1, 4, 3, 1))  # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2))     #     5 x 1 x 3 x 2
result = a + b                # 6 x 5 x 4 x 3 x 2

Question 36

Another way is to use numpy.dstack. Supposing that you want to repeat the matrix a num_repeats times:

import numpy as np
b = np.dstack([a]*num_repeats)

The trick is to wrap the matrix a into a list of a single element, then using the * operator to duplicate the elements in this list num_repeats times.

For example, if:

a = np.array([[1, 2], [1, 2]])
num_repeats = 5

This repeats the array of [1 2; 1 2] 5 times in the third dimension. To verify (in IPython):

In [110]: import numpy as np

In [111]: num_repeats = 5

In [112]: a = np.array([[1, 2], [1, 2]])

In [113]: b = np.dstack([a]*num_repeats)

In [114]: b[:,:,0]
Out[114]: 
array([[1, 2],
       [1, 2]])

In [115]: b[:,:,1]
Out[115]: 
array([[1, 2],
       [1, 2]])

In [116]: b[:,:,2]
Out[116]: 
array([[1, 2],
       [1, 2]])

In [117]: b[:,:,3]
Out[117]: 
array([[1, 2],
       [1, 2]])

In [118]: b[:,:,4]
Out[118]: 
array([[1, 2],
       [1, 2]])

In [119]: b.shape
Out[119]: (2, 2, 5)

At the end we can see that the shape of the matrix is 2 x 2, with 5 slices in the third dimension.

Question 37

Use a view and get free runtime! Extend generic `n-dim` arrays to `n+1-dim`

Introduced in NumPy 1.10.0, we can leverage numpy.broadcast_to to simply generate a 3D view into the 2D input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim cases.

I would use the word stack in place of copy, as readers might confuse it with the copying of arrays that creates memory copies.

Stack along first axis

If we want to stack input arr along the first axis, the solution with np.broadcast_to to create 3D view would be –

np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here

Stack along third/last axis

To stack input arr along the third axis, the solution to create 3D view would be –

np.broadcast_to(arr[...,None],arr.shape+(3,))

If we actually need a memory copy, we can always append .copy() there. Hence, the solutions would be –

np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()

Here’s how the stacking works for the two cases, shown with their shape information for a sample case –

# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)

# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)

# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)

Same solution(s) would work to extend a n-dim input to n+1-dim view output along the first and last axes. Let’s explore some higher dim cases –

3D input case :

In [58]: arr = np.random.rand(4,5,6)

# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)

# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)

4D input case :

In [61]: arr = np.random.rand(4,5,6,7)

# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)

# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)

and so on.

Timings

Let’s use a large sample 2D case and get the timings and verify output being a view.

# Sample input array
In [19]: arr = np.random.rand(1000,1000)

Let’s prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) –

In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True

Let’s get the timings to show that it’s virtually free –

In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop

In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop

Being a view, increasing N from 3 to 3000 changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!

Question 38

A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)

Edit @Mr.F, to preserve dimension order:

B=B.T

Question 39

Here’s a broadcasting example that does exactly what was requested.

a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]

Then b*a is the desired result and (b*a)[:,:,0] produces array([[1, 2],[1, 2]]), which is the original a, as does (b*a)[:,:,1], etc.

Question 40

This can now also be achived using np.tile as follows:

import numpy as np

a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))

b.shape
(3,2,2)

b
array([[[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]],

       [[1, 2],
        [1, 2]]])

Question 41

Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?

Question 42

Yes, you can use the built-in hashlib modules or the built-in hash function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

Question 43

Raymond’s answer is great for python2 (though, you don’t need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you’ll need to make sure you are passing an encoded string. These days, in most circumstances, it’s probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

This means the hash()-based solution suggested, which can be shortened to just:

hash(s) % 10**8

will only return the same value within a given script run:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

So, depending on if this matters in your application (it did in mine), you’ll probably want to stick to the hashlib-based approach.

Question 44

Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

Question 45

I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

Question 46

For example, if we have a numpy array A, and we want a numpy array B with the same elements.

What is the difference between the following (see below) methods? When is additional memory allocated, and when is it not?

B = A
B[:] = A (same as B[:]=A[:]?)
numpy.copy(B, A)

Question 47

All three versions do different things:

B = A

This binds a new name B to the existing object already named A. Afterwards they refer to the same object, so if you modify one in place, you’ll see the change through the other one too.
B[:] = A (same as B[:]=A[:]?)

This copies the values from A into an existing array B. The two arrays must have the same shape for this to work. B[:] = A[:] does the same thing (but B = A[:] would do something more like 1).
numpy.copy(B, A)

This is not legal syntax. You probably meant B = numpy.copy(A). This is almost the same as 2, but it creates a new array, rather than reusing the B array. If there were no other references to the previous B value, the end result would be the same as 2, but it will use more memory temporarily during the copy.

Or maybe you meant numpy.copyto(B, A), which is legal, and is equivalent to 2?

Question 48

B=A creates a reference
B[:]=A makes a copy
numpy.copy(B,A) makes a copy

the last two need additional memory.

To make a deep copy you need to use B = copy.deepcopy(A)

Question 49

This is the only working answer for me:

B=numpy.array(A)

Question 50

I wonder, how to save and load numpy.array data properly. Currently I’m using the numpy.savetxt() method. For example, if I got an array markers, which looks like this:

I try to save it by the use of:

numpy.savetxt('markers.txt', markers)

In other script I try to open previously saved file:

markers = np.fromfile("markers.txt")

And that’s what I get…

Saved data first looks like this:

0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00
0.000000000000000000e+00

But when I save just loaded data by the use of the same method, ie. numpy.savetxt() it looks like this:

1.398043286095131769e-76
1.398043286095288860e-76
1.396426376485745879e-76
1.398043286055061908e-76
1.398043286095288860e-76
1.182950697433698368e-76
1.398043275797188953e-76
1.398043286095288860e-76
1.210894289234927752e-99
1.398040649781712473e-76

What am I doing wrong? PS there are no other “backstage” operation which I perform. Just saving and loading, and that’s what I get. Thank you in advance.

Question 51

The most reliable way I have found to do this is to use np.savetxt with np.loadtxt and not np.fromfile which is better suited to binary files written with tofile. The np.fromfile and np.tofile methods write and read binary files whereas np.savetxt writes a text file. So, for example:

In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.savetxt('test1.txt', a, fmt='%d')
In [3]: b = np.loadtxt('test1.txt', dtype=int)
In [4]: a == b
Out[4]: array([ True,  True,  True,  True], dtype=bool)

Or:

In [5]: a.tofile('test2.dat')
In [6]: c = np.fromfile('test2.dat', dtype=int)
In [7]: c == a
Out[7]: array([ True,  True,  True,  True], dtype=bool)

I use the former method even if it is slower and creates bigger files (sometimes): the binary format can be platform dependent (for example, the file format depends on the endianness of your system).

There is a platform independent format for NumPy arrays, which can be saved and read with np.save and np.load:

In  [8]: np.save('test3.npy', a)    # .npy extension is added if not given
In  [9]: d = np.load('test3.npy')
In [10]: a == d
Out[10]: array([ True,  True,  True,  True], dtype=bool)

Question 52

np.save('data.npy', num_arr) # save
new_num_arr = np.load('data.npy') # load

Question 53

np.fromfile() has a sep= keyword argument:

Separator between items if file is a text file. Empty (“”) separator means the file should be treated as binary. Spaces (” ”) in the separator match zero or more whitespace characters. A separator consisting only of spaces must match at least one whitespace.

The default value of sep="" means that np.fromfile() tries to read it as a binary file rather than a space-separated text file, so you get nonsense values back. If you use np.fromfile('markers.txt', sep=" ") you will get the result you are looking for.

However, as others have pointed out, np.loadtxt() is the preferred way to convert text files to numpy arrays, and unless the file needs to be human-readable it is usually better to use binary formats instead (e.g. np.load()/np.save()).

Question 54

For a short answer you should use np.save and np.load. The advantages of these is that they are made by developers of the numpy library and they already work (plus are likely already optimized nicely) e.g.

import numpy as np
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

np.save(path/'x', x)
np.save(path/'y', y)

x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')

print(x is x_loaded) # False
print(x == x_loaded) # [[ True  True  True  True  True]]

Expanded answer:

In the end it really depends in your needs because you can also save it human readable format (see this Dump a NumPy array into a csv file) or even with other libraries if your files are extremely large (see this best way to preserve numpy arrays on disk for an expanded discussion).

However, (making an expansion since you use the word “properly” in your question) I still think using the numpy function out of the box (and most code!) most likely satisfy most user needs. The most important reason is that it already works. Trying to use something else for any other reason might take you on an unexpectedly LONG rabbit hole to figure out why it doesn’t work and force it work.

Take for example trying to save it with pickle. I tried that just for fun and it took me at least 30 minutes to realize that pickle wouldn’t save my stuff unless I opened & read the file in bytes mode with wb. Took time to google, try thing, understand the error message etc… Small detail but the fact that it already required me to open a file complicated things in unexpected ways. To add that it required me to re-read this (which btw is sort of confusing) Difference between modes a, a+, w, w+, and r+ in built-in open function?.

So if there is an interface that meets your needs use it unless you have a (very) good reason (e.g. compatibility with matlab or for some reason your really want to read the file and printing in python really doesn’t meet your needs, which might be questionable). Furthermore, most likely if you need to optimize it you’ll find out later down the line (rather than spend ages debugging useless stuff like opening a simple numpy file).

So use the interface/numpy provide. It might not be perfect it’s most likely fine, especially for a library that’s been around as long as numpy.

I already spent the saving and loading data with numpy in a bunch of way so have fun with it, hope it helps!

import numpy as np
import pickle
from pathlib import Path

path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)

lb,ub = -1,1
num_samples = 5
x = np.random.uniform(low=lb,high=ub,size=(1,num_samples))
y = x**2 + x + 2

# using save (to npy), savez (to npz)
np.save(path/'x', x)
np.save(path/'y', y)
np.savez(path/'db', x=x, y=y)
with open(path/'db.pkl', 'wb') as db_file:
    pickle.dump(obj={'x':x, 'y':y}, file=db_file)

## using loading npy, npz files
x_loaded = np.load(path/'x.npy')
y_load = np.load(path/'y.npy')
db = np.load(path/'db.npz')
with open(path/'db.pkl', 'rb') as db_file:
    db_pkl = pickle.load(db_file)

print(x is x_loaded)
print(x == x_loaded)
print(x == db['x'])
print(x == db_pkl['x'])
print('done')

Some comments on what I learned:

np.save as expected, this already compresses it well (see https://stackoverflow.com/a/55750128/1601580), works out of the box without any file opening. Clean. Easy. Efficient. Use it.
np.savez uses a uncompressed format (see docs) Save several arrays into a single file in uncompressed .npz format. If you decide to use this (you were warned to go away from the standard solution so expect bugs!) you might discover that you need to use argument names to save it, unless you want to use the default names. So don’t use this if the first already works (or any works use that!)
Pickle also allows for arbitrary code execution. Some people might not want to use this for security reasons.
human readable files are expensive to make etc. Probably not worth it.
there is something called hdf5 for large files. Cool! https://stackoverflow.com/a/9619713/1601580

Note this is not an exhaustive answer. But for other resources check this:

For pickle (guess the top answer is don’t use pickle us np.save): Save Numpy Array using Pickle
For large files (great answer! compares storage size, loading save and more!): https://stackoverflow.com/a/41425878/1601580
For matlab (we have to accept matlab has some freakin’ nice plots!): “Converting” Numpy arrays to Matlab and vice versa
For saving in human readable format: Dump a NumPy array into a csv file

Question 55

I have a Python code whose output is a sized matrix, whose entries are all of the type float. If I save it with the extension .dat the file size is of the order of 500 MB. I read that using h5py reduces the file size considerably. So, let’s say I have the 2D numpy array named A. How do I save it to an h5py file? Also, how do I read the same file and put it as a numpy array in a different code, as I need to do manipulations with the array?

Question 56

h5py provides a model of datasets and groups. The former is basically arrays and the latter you can think of as directories. Each is named. You should look at the documentation for the API and examples:

http://docs.h5py.org/en/latest/quick.html

A simple example where you are creating all of the data upfront and just want to save it to an hdf5 file would look something like:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

You can then load that data back in using: ‘

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

Definitely check out the docs:

http://docs.h5py.org

Writing to hdf5 file depends either on h5py or pytables (each has a different python API that sits on top of the hdf5 file specification). You should also take a look at other simple binary formats provided by numpy natively such as np.save, np.savez etc:

http://docs.scipy.org/doc/numpy/reference/routines.io.html

Question 57

A cleaner way to handle file open/close and avoid memory leaks:

Prep:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

Write:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

Read:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

Question 58

I’m trying to perform an element wise divide in python, but if a zero is encountered, I need the quotient to just be zero.

For example:

array1 = np.array([0, 1, 2])
array2 = np.array([0, 1, 1])

array1 / array2 # should be np.array([0, 1, 2])

I could always just use a for-loop through my data, but to really utilize numpy’s optimizations, I need the divide function to return 0 upon divide by zero errors instead of ignoring the error.

Unless I’m missing something, it doesn’t seem numpy.seterr() can return values upon errors. Does anyone have any other suggestions on how I could get the best out of numpy while setting my own divide by zero error handling?

Question 59

In numpy v1.7+, you can take advantage of the “where” option for ufuncs. You can do things in one line and you don’t have to deal with the errstate context manager.

>>> a = np.array([-1, 0, 1, 2, 3], dtype=float)
>>> b = np.array([ 0, 0, 0, 2, 2], dtype=float)

# If you don't pass `out` the indices where (b == 0) will be uninitialized!
>>> c = np.divide(a, b, out=np.zeros_like(a), where=b!=0)
>>> print(c)
[ 0.   0.   0.   1.   1.5]

In this case, it does the divide calculation anywhere ‘where’ b does not equal zero. When b does equal zero, then it remains unchanged from whatever value you originally gave it in the ‘out’ argument.

Question 60

Building on @Franck Dernoncourt’s answer, fixing -1 / 0:

def div0( a, b ):
    """ ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0  # -inf inf NaN
    return c

div0( [-1, 0, 1], 0 )
array([0, 0, 0])

Question 61

Building on the other answers, and improving on:

0/0 handling by adding invalid='ignore' to numpy.errstate()
introducing numpy.nan_to_num() to convert np.nan to 0.

Code:

import numpy as np

a = np.array([0,0,1,1,2], dtype='float')
b = np.array([0,1,0,1,3], dtype='float')

with np.errstate(divide='ignore', invalid='ignore'):
    c = np.true_divide(a,b)
    c[c == np.inf] = 0
    c = np.nan_to_num(c)

print('c: {0}'.format(c))

Output:

c: [ 0.          0.          0.          1.          0.66666667]

Question 62

One-liner (throws warning)

np.nan_to_num(array1 / array2)

Question 63

Try doing it in two steps. Division first, then replace.

with numpy.errstate(divide='ignore'):
    result = numerator / denominator
    result[denominator == 0] = 0

The numpy.errstate line is optional, and just prevents numpy from telling you about the “error” of dividing by zero, since you’re already intending to do so, and handling that case.

Question 64

You can also replace based on inf, only if the array dtypes are floats, as per this answer:

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> c = a / b
>>> c
array([ inf,   2.,   1.])
>>> c[c == np.inf] = 0
>>> c
array([ 0.,  2.,  1.])

问题：在numpy中将一维数组转换为二维数组

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：从NumPy数组中选择特定的行和列

回答 0

回答 1

[编辑]内置方法： np.ix_

[Edit] The built-in method: np.ix_

回答 2

回答 3

问题：这是什么意思？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：列表到数组的转换以使用ravel（）函数

回答 0

回答 1

创建一个int数组和一个列表

create an int array and a list

回答 2

回答 3

回答 4

回答 5

问题：将2D数组复制到3维，N次（Python）

回答 0

回答 1

回答 2

使用视图并获得免费的运行时！将通用n-dim数组扩展为n+1-dim

时机

Use a view and get free runtime! Extend generic n-dim arrays to n+1-dim

Timings

回答 3

回答 4

回答 5

问题：如何将字符串散列为8位数字？

回答 0

回答 1

回答 2

回答 3

问题：带副本的numpy数组分配

回答 0

回答 1

回答 2

问题：如何正确保存和加载numpy.array（）数据？

回答 0

回答 1

回答 2

回答 3

问题：输入和输出numpy数组到h5py

回答 0

回答 1

问题：如何用零除返回0

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

有趣好用的Python教程

[编辑]内置方法： `np.ix_`

[Edit] The built-in method: `np.ix_`

使用视图并获得免费的运行时！将通用`n-dim`数组扩展为`n+1-dim`

Use a view and get free runtime! Extend generic `n-dim` arrays to `n+1-dim`