# 如何将NumPy数组标准化到一定范围内？

## 问题：如何将NumPy数组标准化到一定范围内？

``````# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)``````

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

``````# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)
``````

Is there a less verbose, convenience function way to do this? `matplotlib.colors.Normalize()` doesn't seem to be related.

## 回答 0

``````audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())``````

``image *= 255.0/image.max()    # Uses 1 division and image.size multiplications``

``image /= image.max()/255.0    # Uses 1+image.size divisions``

``image = image.astype('float64')``
``````audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())
``````

Using `/=` and `*=` allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

``````image *= 255.0/image.max()    # Uses 1 division and image.size multiplications
``````

is marginally faster than

``````image /= image.max()/255.0    # Uses 1+image.size divisions
``````

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.

In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the `audio` and `image` arrays need to have floating-point point dtype before the in-place operations are performed. If they are not already of floating-point dtype, you'll need to convert them using `astype`. For example,

``````image = image.astype('float64')
``````

## 回答 1

``````import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1``````

``````def nan_ptp(a):
return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)``````

``e = (a - np.mean(a)) / np.std(a)``

If the array contains both positive and negative data, I'd go with:

``````import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1
``````

If the array contains `nan`, one solution could be to just remove them as:

``````def nan_ptp(a):
return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)
``````

However, depending on the context you might want to treat `nan` differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.

Finally, worth mentioning even if it's not OP's question, standardization:

``````e = (a - np.mean(a)) / np.std(a)
``````

## 回答 2

``````from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )``````

You can also rescale using `sklearn`. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

``````from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )
``````

The keyword arguments `axis`, `with_mean`, `with_std` are self explanatory, and are shown in their default state. The argument `copy` performs the operation in-place if it is set to `False`. Documentation here.

## 回答 3

``image /= (image.max()/255.0)``

``````def normalize_columns(arr):
rows, cols = arr.shape
for col in xrange(cols):
arr[:,col] /= abs(arr[:,col]).max()``````

You can use the "i" (as in idiv, imul..) version, and it doesn't look half bad:

``````image /= (image.max()/255.0)
``````

For the other case you can write a function to normalize an n-dimensional array by colums:

``````def normalize_columns(arr):
rows, cols = arr.shape
for col in xrange(cols):
arr[:,col] /= abs(arr[:,col]).max()
``````

## 回答 4

``audio_scaled = minmax_scale(audio, feature_range=(-1,1))``

``````shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)``````

You are trying to min-max scale the values of `audio` between -1 and +1 and `image` between 0 and 255.

Using `sklearn.preprocessing.minmax_scale`, should easily solve your problem.

e.g.:

``````audio_scaled = minmax_scale(audio, feature_range=(-1,1))
``````

and

``````shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)
``````

note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.

## 回答 5

``````scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)``````

A simple solution is using the scalers offered by the sklearn.preprocessing library.

``````scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)
``````

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()

## 回答 6

``TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''``

`numpy`我试图正常化阵列是一个`integer`数组。似乎他们不赞成在版本>中进行类型转换`1.10`，而您必须使用它`numpy.true_divide()`来解决该问题。

``````arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)``````

`img`是一个`PIL.Image`对象。

I tried following this, and got the error

``````TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
``````

The `numpy` array I was trying to normalize was an `integer` array. It seems they deprecated type casting in versions > `1.10`, and you have to use `numpy.true_divide()` to resolve that.

``````arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)
``````

`img` was an `PIL.Image` object.