


def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm


该函数在v向量为0 的情况下起作用。

I would like to have the norm of one NumPy array. More specifically, I am looking for an equivalent version of this function

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

Is there something like that in skearn or numpy?

This function works in a situation where v is the 0 vector.

回答 0


import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

If you’re using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

回答 1


import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)


I would agree that it were nice if such a function was part of the included batteries. But it isn’t, as far as I know. Here is a version for arbitrary axes, and giving optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)


回答 2


def normalize(v):
    norm=np.linalg.norm(v, ord=1)
    if norm==0:
    return v/norm

You can specify ord to get the L1 norm. To avoid zero division I use eps, but that’s maybe not great.

def normalize(v):
    norm=np.linalg.norm(v, ord=1)
    if norm==0:
    return v/norm

回答 3


import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

但在v长度为0 时失败。

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

回答 4


def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

使用numpys 峰到峰功能。

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

回答 5

Christoph Gohlke unit_vector()在流行的转换模块中还具有将向量标准化的功能:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

回答 6


科学工具学习 MinMaxScaler

在sci-kit学习中,有一个名为的API MinMaxScaler,可以根据需要自定义值范围。





# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn MinMaxScaler

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

NaNs are treated as missing values: disregarded in fit, and maintained in transform. … see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

回答 7


假设行是变量列是样本axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)


array([[1, 2, 3],
       [4, 5, 6]])

array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

Without sklearn and using just numpy. Just define a function:.

Assuming that the rows are the variables and the columns the samples (axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)


array([[1, 2, 3],
       [4, 5, 6]])

array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

回答 8


import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()

回答 9

如果您正在使用3D向量,则可以使用toolbelt vg简洁地执行此操作。它是numpy之上的一个轻层,它支持单个值和堆叠的向量。

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True


If you’re working with 3D vectors, you can do this concisely using the toolbelt vg. It’s a light layer on top of numpy and it supports single values and stacked vectors.

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

回答 10


v_norm = v / (np.linalg.norm(v) + 1e-16)

If you don’t need utmost precision, your function can be reduced to:

v_norm = v / (np.linalg.norm(v) + 1e-16)

回答 11



import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]

If you work with multidimensional array following fast solution is possible.

Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.

import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]