Vector Norm in Machine Learning – An Introduction

“What is the vector norm, and why on earth do I need to know that?” Let’s assume you start with asking that question by seeing the glass half empty instead of half-full. So I am here to convince you. If this introduction could not convince you, then simply do not proceed as it would be a waste of time!

vector norm

In Machine Learning, we are dealing with evaluations all the time. It is needless to say it usually involves Linear Algebra, vectors, and matrices. In evaluating an element, such as loss functions, often, you need to summarize all you need in one number. Well, you use mean, standard deviation, etc. BUT, how do you deal with vectors? How would you address the importance of a vector in terms of its elements? How would you evaluate the vector elements in terms of their contributions to the outcome? We need some metrics to get a sense of the magnitude of a vector to see how much it affects the algorithm optimization, evaluation performance, etc. Here are the norms that come to play.

In this article, I will explain the vector norms. You will realize why they are essential, and what is their interpretation? Furthermore, you will learn how to implement them. I start with the definitions and ends it with the matrix norms. By the end of this article, you should know:

  • What is the definition of the norm?
  • The norms’ properties
  • Mostly used vector norms
  • Definition of the matrix norm
  • How to implement them in Python?

What is a Vector Norm?

We usually use the norms for vectors and rarely for matrices. At first, let’s define what the norm of a vector is? A norm can be described as below:

  • A function that operates on a vector (matrix) and returns a scalar element.
  • A norm is denoted by \mathcal{L}^{q} in which q shows the order of the norm and p\geq 1, p \in \mathbb{R}.
  • The order a vector (matrix) is always a non-negative value.
  • The intuition behind the norm is to measure a kind of distance.
vector norm

A norm is mathematically defined as below:

    \[\mathcal{L}^q(\mathbf{v})=\left | \left | \mathbf{v} \right | \right |_q = \left ( \sum_{k}\left | v_k \right |^q \right )^{1/q}\]

The sign |.| is an operation that outputs the absolute value of its argument. The example of which is |-2|=2 and |2|=2. You can implement \mathcal{L}^{q} by the following Python code:

# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm


# Define a vector
v = np.array([2,3,1,0])

# Take the q-norm which q=2
q = 2
v_norm = norm(v, ord=q)

# Print values
print('The vector: ', v)
print('The vector norm: ', v_norm)

you should get the following output:

The vector:  [2 3 1 0]
The vector norm:  3.7416573867739413

The Norm Function Properties

It is crucial to know the norms properties as we may need them in mathematical computation, especially mathematical proofs. A norm function \mathcal{L}(.) has the following properties:

  1. If the norm is zero, then the vector is all zero as well: \mathcal{L}(\mathbf{v})=0 \Rightarrow \mathbf{v}=0
  2. \mathcal{L}(\mathbf{v+w}) \leq \mathcal{L}(\mathbf{v}) + \mathcal{L}(\mathbf{w}) . This is called triangle inequality.
  3. \forall \beta \in \mathbb{R}: \mathcal{L}(\beta \mathbf{v})=|\beta|\mathcal{L}(\mathbf{v})

Proving the Properties (Advanced)

Properties (1) is easy to prove. As the norm add the absolute values of vector elements to any power, if the norm is zero, the only possible answer is that all vector elements must be zero.

We can prove property (3) as below:

    \[\mathcal{L}^q(\beta \mathbf{v})= \left ( \sum_{k}\left |\beta v_k \right |^q \right )^{1/q} = \left ( |\beta|^q\sum_{k}\left | v_k \right |^q \right )^{1/q}=\]

    \[|\beta| \left (\sum_{k}\left | v_k \right |^q \right )^{1/q} = |\beta| \mathcal{L}^q(\mathbf{v})\]

We can mathematically prove property (2) as well. BUT, let’s do it in an empirical way. Let’s define our experiment as below:

  1. We randomly generate two vectors \mathbf{v} and \mathbf{w}
  2. We check the property that should be true.
  3. Repeat the experiment for E=100 times.

This is NOT scientific proof. However, if E=\infty, then we can say we proved it. Right?

Let’s do it in Python:

# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm

# Repeat experiments
E = 100  # Numper of experiments
q = 2    # Order of the norm
for i in range(E):
  # Define two random vector of size (1,5). Obviously v does not equal w!!
  v = np.random.rand(1,5)
  w = np.random.rand(1,5)

  propertyCheck = norm(v+w, ord=q) <= norm(v, ord=q) + norm(w, ord=q)
  if propertyCheck == False:
    print('Property is NOT correct')

So if the property (2) holds for all experiments, we expect the above Python code returns NOTHING. Think why?

Mostly Used Norms

In the previous section, I described what is a norm in general, and we implement it in Python. Here, I would like to discuss the norms that are mostly used in Machine Learning.

\mathcal{L}^{1} Norm

The \mathcal{L}^{1} norm is technically the summation over the absolute values of a vector. The simple mathematical formulation is as below:

    \[\left | \left | \mathbf{v} \right | \right |_1 = \sum_{k} \left | v_k \right | \right\]

In Machine Learning, we usually use \mathcal{L}^{1} norm when the sparsity of a vector matters, i.e., when the essential factor is the non-zero elements of a matrix. BUT why? The \mathcal{L}^{1} simply target the non-zero elements by adding them up.

\mathcal{L}^{2} Norm

\mathcal{L}^{2} norm is also called the Euclidean norm which is the Euclidean distance of a vector to zero.

vector norm

We can simply calculate it as below:

    \[\left | \left | \mathbf{v} \right | \right |_2 = \sqrt{\sum_{k}\left | v_k \right |^2 \right }\]

The \mathcal{L}^{2} is commonly used in Machine Learning due to being differentiable, which is crucial for optimization purposes.

Let’s calculate \mathcal{L}^{2} norm of a random vector with Python using two approaches. THe both should lead to the same results:

# Import Numpy package and the norm function
import numpy as np
from numpy.linalg import norm

# Defining a random vector
v = np.random.rand(1,5)

# Calculate L-2 norm
sum_square = 0
for i in range(v.shape[1]):
  # Define two random vector of size (1,5). Obviously v does not equal w!!
  sum_square += np.square(v[0,i])
L2_norm_approach_1 = np.sqrt(sum_square)

# Calculate L-2 norm using numpy
L2_norm_approach_2 = norm(v, ord=2)

print('L2_norm: ', L2_norm_approach_1)
print('L2_norm with numpy:', L2_norm_approach_2)

You should get the same results!

Max Norm

Well, you may not see this norm quite often. However, it is a kind of definition that you should be familiar with. The max norm is denoted with \mathcal{L}^\infty and the mathematical formulation is as below:

    \[\left | \left | \mathbf{v} \right | \right |_\infty = \underset{k}{max} \left | v_k \right |\]

It simply returns the maximum absolute value in the vector elements.

Norm of a Matrix

For calculating the norm of a matrix, we have the unusual definition of Frobenius norm which is very similar to \mathcal{L}^{2} norm of a vector and is as below:

    \[\left | \left | \mathbf{M} \right | \right |_F = \sqrt{\sum_{i.j}\left | M_{i,j} \right |^2 \right }\]

Conclusion

In this article, you learned what the norms are and how to implement them. I wish if I could tell you this is the end and that’s all you need! BUT, no! Learning is an active process. You will see many many tutorials and concepts regarding the vector norms. Here, I tried to address what I believed you would see more frequently. There is always MORE! You can start with what I just provided. Make sure to play with Python codes. I assure you it will help you to gain a better understanding. I hope you find it useful. Please don’t forget to share your thoughts with me.

Leave a Reply

avatar
  Subscribe  
Notify of
Tweet
Share
Pin
Share