This tutorial in dedicated to the concept of linear independence of vectors and its associations with the solution of linear equation systems.

What is the concept of vectors’ linear independence, and why should I care about it? One simple example: Let’s say you want to find the answer to the \mathbf{A} \mathbf{x}=\mathbf{b}. One may say we can simply find the answer by taking an inverse of \mathbf{A} as \mathbf{x}= \mathbf{A}^{-1}\mathbf{b}. NOT SO FAST! There is more in that? What if \mathbf{A} does not have an inverse or is not square? You just let it go? I guess not.

That was just one example. In Machine Learning, it frequently happens that you want to explore the correlation between vectors to analyze them better. For example, you are dealing with a matrix which in essence, is formed by vectors. What would you know if you are not familiar with the concept of linear independence?

In this tutorial, you will learn the following:

Before You Move On

You may find the following resources helpful to better understand the concept of this article:

The Concept of Linear Independence

Assuming we have the set of \{\mathbf{v}^{1},\mathbf{v}^{2},\hdots,\mathbf{v}^{p}\} \in \mathbb{R}^{q} which are p column vectors of size q. Then, we call this set linear independent, if no vector exists that we can represent it as the linear combination of any other two vectors. Although, perhaps it is easier to define linear dependent: A vector \mathbf{v}^{i} is linear dependent if we can express it as the linear combination of another two vectors in the set, as below:

    \[\mathbf{v}^{i} = \sum_{j \neq i} \alpha_{j} \mathbf{v}^{j}\]

In the above case, we say the set of \{\mathbf{v}^{1},\mathbf{v}^{2},\hdots,\mathbf{v}^{p}\} vectors are linearly dependent!

Vector d is a linear combination of vectors a, b, and c. Actually, d = a + b + c.


Consider the three vectors below:

    \[\mathbf{v}= \begin{bmatrix} 1\\  -1\\  2 \end{bmatrix} , \mathbf{u}= \begin{bmatrix} 0\\  3\\  1 \end{bmatrix} , \mathbf{w}= \begin{bmatrix} 2\\  1\\  5 \end{bmatrix}\]

The above set is linearly dependent. Why? It is simple. Because \mathbf{w} = 2\mathbf{v} + \mathbf{u}. Let’s do the above with Python and Numpy:

# Import Numpy library
import numpy as np
# Define three column vectors
v = np.array([1, -1, 2]).reshape(-1,1)
u = np.array([0, 3, 1]).reshape(-1,1)
w = np.array([2, 1, 5]).reshape(-1,1)
# Check the linear dependency with writing the equality
print('Does the equality w = 2v+u holds? Answer:', np.all(w == 2*v+u))

Run the above code and see if Numpy confirms that or not!

The Relationship With Matrix Rank

I talked about the linear dependence of vectors so far. Assume we have the matrix \mathbf{M} \in \mathbb{R}^{m \times n}. The matrix has m rows and n columns. Let’s focus on the columns. The n columns form n vectors. I denote the j^{th} column with \mathbf{M}_{:,j}. So we have the set of n vectors \{\mathbf{M}_{:,1},\mathbf{M}_{:,2},\hdots,\mathbf{M}_{:,n}\} as columns. The size of the largest subset of \{\mathbf{M}_{:,1},\mathbf{M}_{:,2},\hdots,\mathbf{M}_{:,n}\} that its vectors are linearly independent, is called the column rank of the matrix \mathbf{M}. Considering the rows of \mathbf{M}, the size of the largest subset of rows that form a linearly independent set is called the row rank of the matrix \mathbf{M}.

We have the following properties for matrix ranks:

  1. For \mathbf{M} \in \mathbb{R}^{m \times n}, rank(M) \leq min(m,n). If rank(M) = min(m,n), the matrix \mathbf{M} is full rank.
  2. For \mathbf{M} \in \mathbb{R}^{m \times n}, \mathbf{N} \in \mathbb{R}^{n \times q}, and \mathbf{H=MN} \in \mathbb{R}^{m \times q}, rank(H) \leq min(rank(M),rank(N)).
  3. For \mathbf{M} \in \mathbb{R}^{m \times n}, \mathbf{N} \in \mathbb{R}^{m \times n}, and \mathbf{H= M + N} \in \mathbb{R}^{m \times n}, rank(H) \leq rank(M) + rank(N).

Check first and second property above with the following code:

# Import Numpy library
import numpy as np
from numpy.linalg import matrix_rank
# Define random 3x4 matrix using np.array
# Ref:
M = np.random.randint(10, size=(3, 4))
N = np.random.randint(10, size=(4, 3))
# np.all() test whether all array elements are True.
# More info:
checkProperty = np.all(matrix_rank(M) <= min(M.shape[0],M.shape[1]))
if checkProperty: print('Property rank(M) <= min(M.shape[0],M.shape[1]) is confirmed!')
checkProperty = np.all(matrix_rank(np.matmul(M,N)) <= min(matrix_rank(M),matrix_rank(N)))
if checkProperty: print('Property rank(MN) <= min(rank(M),rank(N)) is confirmed!')

Practice: Modify the above code and check property (3).

Linear Equations

I talked about linear dependency and matrix ranks. After that, I would like to discuss their application in finding the solution of linear equations, which is of great importance. Consider the following equality which set a system of linear equations:

    \[\mathbf{A}^{m \times n}\mathbf{x}^{n \times 1}=\mathbf{b}^{m \times 1}\]

Above, we see the matrix \mathbf{A} is multiplied by the vector \mathbf{x} and forms another vector \mathbf{b}. The above equality creates a set of m-line linear equations. Let’s write line j for example:

    \[\mathbf{A}_{j,1} \mathbf{x}_{1} + \mathbf{A}_{j,2} \mathbf{x}_{2} + \hdots + \mathbf{A}_{j,n} \mathbf{x}_{n} = \mathbf{b}_{j} , j \in \{1,2,\hdots,m\}\]

So the question is how many solution exists for the system of equations \mathbf{A} \mathbf{x}=\mathbf{b}. By solution I mean the possible values for the variables \mathbf{x}=[x_1,x_2,\hdots,x_n]^T. The answer is one of the following:

  • There is NO solution.
  • The system has one unique solution.
  • We have infinite numbers of solutions.

As you observed, having more than one BUT less than infinity solutions is off the table!

Theorem: If \mathbf{u} and \mathbf{v} are two solutions of the \mathbf{A} \mathbf{x}=\mathbf{b} equation, then the specific linear combination of them as below, is a solution as well:

    \[\mathbf{w} = \beta \mathbf{u} + (1-\beta) \mathbf{v}, \forall \beta \in \mathbb{R}\]

PROOF: Look at the below equations to see how \mathbf{w} is also a solution:

    \[\mathbf{A}\mathbf{w} = \mathbf{A}\{ \beta \mathbf{v} + (1-\beta)\mathbf{u}\}= \beta \mathbf{A}\mathbf{v} + \mathbf{A}\mathbf{u} - \beta \mathbf{A}\mathbf{u} = \beta\mathbf{b} + \mathbf{b} - \beta \mathbf{b} = b\]

So the above proves shows that if we have more than one solution, then we can say we have infinite number of solutions!

The following two conditions determine the number of solutions if there is at least one solution! Try to prove them based on what you learned so far:

  • \mathbf{A}^{m \times n}\mathbf{x}^{n \times 1}=\mathbf{b}^{m \times 1} has at least one solution if m \leq n and Rank(\mathbf{A}) = m.
  • \mathbf{A}^{m \times n}\mathbf{x}^{n \times 1}=\mathbf{b}^{m \times 1} has exactly one solution if m = n and Rank(\mathbf{A}) = m.


In this tutorial, I discussed the concept of linear independence of the vectors and their associates with the system of linear equations. This concept is crucial, especially in Machine Learning and optimization theory, in which we are dealing with all sorts of mathematical proofs necessary to justify why a method should work! For the majority of what you may work in Machine Learning, you may not need to use what I talked about here directlyBUT, you need to know it if you would like to stand outDo you see any ambiguous concept here? Anything missing or wrong? Feel free to ask as it will help me, yourself, and the others to learn better, and I can further improve this article. 

Leave a Comment

Your email address will not be published. Required fields are marked *