Have you ever wondered why you need to know matrix operations? The answer is deadly simple: To work with matrices! Above all, I assume you already know the importance of linear algebra in Machine Learning and you are familiar with the basic definitions. Therefore, I do not need to talk about why it is important to know the matrix operations. Do I? In this tutorial, I will explain the most important matrix operations that we desperately need and frequently encounter in Machine Learning.

Here’s what you will learn here:

  • The core matrix operations such a matrix transpose, multiplication, and inversion.
  • For each of the operations, you will learn how to implement them in Python.
  • I will explain the properties of the explain operations.
Matrix Operation in Linear Algebra

The assumption is that you are somehow familiar with Python or you are in the process of learning. If you would like to learn Python the easy way, you can check my YouTube course online which is freely available for all. However, you do NOT need to know Python to understand the concepts presented in this article.

Before You Move On

You may find the following resources helpful to better understand the concept of this article:

Matrix Transpose

The transpose of a matrix is an operator which switches the row and column indices of the matrix. As a result, after transposing a matrix, we have a new matrix and is denoted as \mathbf{A}^T. Therefore, the i^{th} row, j^{th} column elements of \mathbf{A} are the j^{th} row and i^{th} column elements of \mathbf{A}^T, respectively. Assume we show \mathbf{A}^T with matrix \mathbf{B}, then we have the following:

    \[\mathbf{B}_{j,i}=\mathbf{A}_{i,j}\]

We can have the following example to clarify better:

    \[\begin{bmatrix} 2 &  3&  1\\  3 &  -4 & 2.2 \end{bmatrix} ^ T = \begin{bmatrix} 2 &  3\\  3 &  -4\\ 1 & -2.2 \end{bmatrix}\]

Calculation of a matrix transpose is deadly easy with Python. For instance, you can try the following code:

# Use Numpy package 
import numpy as np
# Define a 3x2 matrix using np.array
A = np.array([[1, 2.2], [4, 7], [8, -2]])
"""A = [[ 1.   2.2]
       [ 4.   7. ]
       [ 8.  -2. ]] """
print("A is: {}".format(A))
print("The shape of A is: {}".format(A.shape))
# Use transpose() method
B = A.transpose()
"""B = [[ 1.   4.   8. ]
       [ 2.2  7.  -2. ]]"""
print("B is: {}".format(B))
print("The shape of B is: {}".format(B.shape))

You should get the following output:

A is: [[ 1.   2.2]
 [ 4.   7. ]
 [ 8.  -2. ]]
The shape of A is: (3, 2)
B is: [[ 1.   4.   8. ]
 [ 2.2  7.  -2. ]]
The shape of B is: (2, 3)

NOTE: It is worth noting that (\mathbf{A}^T)^T=\mathbf{A}.

Identity Matrix

Identity matrix, which is denoted as \mathbf{I}_n, is a square matrix (number of rows = number of columns = n) that all its elements along the main diagonal are 1’s and the other elements are zero. For example, we can have the following 4 \times 4 identity matrix:

    \[\begin{bmatrix} 1 &  0&  0& 0\\  0 &  1&  0& 0\\ 0 &  0 &  1& 0\\  0 &  0&  0 & 1 \end{bmatrix}\]

You can create the above matrix using the following code:

# Use Numpy package 
import numpy as np
# Define an Identity matrix
# Ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html
A = np.eye(4)

Adding Operation

Adding two matrices is possible only if both matrices have the same shape. Assume \mathbf{C} = \mathbf{A} + \mathbf{B}. Let’s get back to Python and define the same two matrices defined above. After that, we will add them together:

# Use Numpy package 
import numpy as np
# Define a 3x2 matrix using np.array
A = np.array([[1, 2.2], [4, 7], [8, -2]])
# Use transpose() method
B = A.transpose()
# Create a matrix similar to A in shape but filled with random numbers
# Use *A.shape argument
A_like = np.random.randn(*A.shape)
# Add two matrices of the same shape
M = A + A_like
print("M equals to: ", M)
# Add two matrices with different shape
C = A + B
print("C equals to: ", C)

You should get the following output:

M equals to:  [[ 2.4212905   2.88158481]
 [ 4.34872344  5.01038501]
 [ 7.58194231 -2.0192284 ]]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-293b043ce9a9> in <module>()
     16 
     17 # Add two matrices with different shape
---> 18 C = A + B
     19 print("C equals to: ", C)
ValueError: operands could not be broadcast together with shapes (3,2) (2,3)

There are some important characteristics for adding matrices that you do not want to miss:

  • \mathbf{A} + \mathbf{B} = \mathbf{B} + \mathbf{A}
  • (\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})
  • \mathbf{A} + \mathbf{0} = \mathbf{0} + \mathbf{A}
  • (\mathbf{A}+\mathbf{B})^T=\mathbf{A}^T+\mathbf{B}^T

You can simply confirm the above properties with the following Python code:

# Use Numpy package 
import numpy as np
# Define random 3x4 matrix using np.array
# Ref: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html
A = np.random.randint(10, size=(3, 4))
B = np.random.randint(10, size=(3, 4))
C = np.random.randint(10, size=(3, 4))
# np.all() test whether all array elements are True.
# More info: https://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html
checkProperty = np.all(A + B == B + A)
if checkProperty: print('Property A + B == B + A is confirmed!')
checkProperty = np.all((A + B) + C == A + (B + C))
if checkProperty: print('Property (A + B) + C == A + (B + C) is confirmed!')
checkProperty = np.all(A + 0 == 0 + A)
if checkProperty: print('Property A + 0 == 0 + A is confirmed!')
checkProperty = np.all((A + B).transpose() == A.transpose() + B.transpose())
if checkProperty: print('Property (A + B)^T == A^T + B^T is confirmed!')

Scalar Multiplication

If we multiply a matrix by a number (a.k.a. scalar), the result equals to multipling every entry of the matrix by that specific scalar. For example, you can check the following calculations:

    \[2 * \begin{bmatrix} 2 &  3 &  1\\  3 &  -4 & 2.2 \end{bmatrix} = \begin{bmatrix} 4 &  6 &  2\\  6 &  -8 & 4.4 \end{bmatrix}\]

The properties of scalar and matrix multiplication are as below:

  • (\alpha + \beta) \mathbf{A} = \alpha \mathbf{A} + \beta \mathbf{A}
  • \alpha (\mathbf{A}+\mathbf{B}) = \alpha \mathbf{A} + \alpha \mathbf{B}
  • 0 . \mathbf{A} = 0
  • 1 . \mathbf{A} = \mathbf{A}

Matrix Multiplication

Assuming we like to multiply to matrices and calculate the output as \mathbf{C} = \mathbf{A} \mathbf{B}. For doing so, the dimension of \mathbf{A} and \mathbf{B} matrices should match. Here, being matched does not mean being equal. Matching means the number of columns of \mathbf{A} should equal the number of rows in \mathbf{B}. An example is to assume the shape of \mathbf{A} equal to m \times q. Then, the shape of \mathbf{B} MUST be q \times n, so q is shared. As a result of such multiplication, the shape of \mathbf{C} equal to m \times n.

Check the following Python code to have a better understanding of matrix multiplication:

# Use Numpy package 
import numpy as np
# Define two 3x4 random matrices using np.array
# Ref: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html
A = np.random.randint(10, size=(3, 4))
B = np.random.randint(10, size=(3, 4))
print('Shape A is: {}'.format(A.shape))
print('Shape B is: {}'.format(B.shape))
# Calculate the number of colums in A and number of rows in B
A_num_columns = A.shape[1]
B_num_rows = B.shape[0]
# Check the dimensions
if A_num_columns != B_num_rows: print('dimension mismatch')
# You should get an error as A_num_columns != B_num_rows
C = np.matmul(A , B)

The output will be:

Shape A is: (3, 4)
Shape B is: (3, 4)
dimension mismatch
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-dc16b8d9d74a> in <module>()
     17 
     18 # You should get an error as A_num_columns != B_num_columns
---> 19 C = np.matmul(A , B)
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 4)

Mathematically speaking, we can calculate different elements of the \mathbf{C} as below:

    \[C_{i,j} = \sum_{m}A_{i,m}B_{m,j}=A_{i,1}B_{1,j} + ... + A_{i,q}B_{q,j}\]

An illustrative example is shown below:

    \[\begin{bmatrix} 2 &  3& -1\\  7& 0 & 2 \end{bmatrix} . \begin{bmatrix} 2 & 1\\  2 & -1\\ 1 & 5\\ \end{bmatrix} = \begin{bmatrix} 2\times2 + 3 \times 2 - 1 \times 1 &  \dotsb\\  7 \times 2 + 0 \times 2 + 2 \times 1 & \dotsb \end{bmatrix}\]

You can implement the above example with the following code and confirm the results:

# Use Numpy package 
import numpy as np
# Define two matrices
A = np.array([[2,3,-1],[7,0,2]])
B = np.array([[2,1],[2,-1],[1,5]])
print('Shape A is: {}'.format(A.shape))
print('Shape B is: {}'.format(B.shape))
# Calculate the number of columns in A and number of rows in B
A_num_columns = A.shape[1]
B_num_rows = B.shape[0]
# Check the dimensions
if A_num_columns != B_num_rows: print('dimension mismatch')
# You should get an error as A_num_columns != B_num_rows
C = np.matmul(A , B)
print('C=AB= {}'.format(C))
# Instead of C=AB let's calculate C=BA
C = np.matmul(B , A)
print('C=BA= {}'.format(C))

The output will be:

Shape A is: (2, 3)
Shape B is: (3, 2)
C=AB= [[ 9 -6]
 [16 17]]
C=BA= [[11  6  0]
 [-3  6 -4]
 [37  3  9]]

You can clearly check that \mathbf{AB} \neq \mathbf{BA}. Properties of matrix multiplication are as below:

  • (\mathbf{A}\mathbf{B})\mathbf{C}=\mathbf{A}(\mathbf{B}\mathbf{C})
  • \mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}
  • {(\mathbf{A}\mathbf{B})}^T=\mathbf{B}^T\mathbf{A}^T

You can check all the above properties with the following Python code:

# Use Numpy package 
import numpy as np
# Define three random 3x3 matrix using np.array
# Ref: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html
A = np.random.randint(10, size=(3, 3))
B = np.random.randint(10, size=(3, 3))
C = np.random.randint(10, size=(3, 3))
# np.all() test whether all array elements are True.
# More info: https://docs.scipy.org/doc/numpy/reference/generated/numpy.all.html
checkProperty = np.all(np.matmul(A,np.matmul(B,C)) == np.matmul(np.matmul(A,B),C))
if checkProperty: print('Property A(BC) = (AB)C is confirmed!')
checkProperty = np.all(np.matmul(A,B+C) == np.matmul(A,B) + np.matmul(A,C))
if checkProperty: print('Property A(B+C) = AB + AC is confirmed!')
checkProperty = np.all(np.matmul(A,B).transpose() == np.matmul(B.transpose(),A.transpose()))
if checkProperty: print('Property (AB)^T = B^T.A^T is confirmed!')

It is important to address the matrix power. Assuming you see something like \mathbf{A}^3. This simply means we multiply the \mathbf{A} matrix to its own for three times. It can be shown as below:

    \[\mathbf{A}^3 = \mathbf{A} \times \mathbf{A} \times \mathbf{A}\]

BUT, the matrix should be an square matrix, e.g., the number of its rows and columns MUST be the same. Otherwise, the power operation would be invalid. Implement the following code to see what would happen:

# Calculate the matrix power for two square and non-square matrices.
# Use Numpy package 
import numpy as np
# A: 3x3 SQUARE matrix using np.array
A = np.random.randint(10, size=(3, 3))
print('A has the shape of: {}'.format(A.shape))
# Calculate A^3
n=3
output = np.linalg.matrix_power(A, n)
print('Output has the shape of: {}'.format(output.shape))
# A: 3x4 NON-SQUARE matrix using np.array
B = np.random.randint(10, size=(3, 4))
print('B has the shape of: {}'.format(B.shape))
# Calculate A^3
n=3
output = np.linalg.matrix_power(B, n)
print('Output has the shape of: {}'.format(output.shape))

Vector-Matrix Multiplication

At first, I repeat the following note from one of the previous posts:

NOTE: We usually represent a vector using one column and multiple rows of elements. Henceforth, we can call a vector of size k as a k \times 1 matrix. In general, we can informally say vectors are special kind of matrices which are 1-dimensional.

Similarly, to multiply the vector \mathbf{v} to matrix \mathbf{M}, the dimensions must match. Assume our matrix \mathbf{A} has the size of m \times n and we like to calculate the \mathbf{w}=\mathbf{A}\mathbf{v}. What is the size of \mathbf{v}? Can you guess? The only acceptable size for \mathbf{v} is n \times 1. So we have the following operation:

    \[\begin{bmatrix} w_1\\  w_2\\  \vdots \\ w_m\end{bmatrix}^{m \times 1} = \begin{bmatrix} A_{11} &  A_{12} & \dotsb  & A_{1n}\\  A_{21} &  A_{22} & \dotsb  & A_{2n}\\  \vdots  &  \dotsb  &  \dotsb  & \vdots \\  A_{m1} &  \dotsb  &  \dotsb  & A_{mn} \end{bmatrix}^{m \times n} \begin{bmatrix} v_1\\  v_2\\  \vdots \\ v_n \end{bmatrix}^{n \times 1}\]

You can implement a working example as below:

# Use Numpy package 
import numpy as np
# A: 3x4 matrix using np.array
A = np.random.randint(10, size=(3, 4))
# B: A vector of size 4
v = np.random.randint(10, size=(4,))
print('Shape A is: {}'.format(A.shape))
print('Shape v is: {}'.format(v.shape))
# Calculate the number of colums in A and number of rows in B
A_num_columns = A.shape[1]
v_num_rows = v.shape[0]
# Check the dimensions
if A_num_columns != v_num_rows: print('dimension mismatch')
# You should get an error as A_num_columns != B_num_columns
w = np.matmul(A , v)
print('w=Av=', w)
print('The shape of w is:',w.shape)

Matrix Inverse

The inverse of matrix \mathbf{A} exists and is shown with \mathbf{A}^{-1} when both of the following conditions hold:

  • \mathbf{A} is a square matrix
  • Multiplying \mathbf{A} with its inverse results in identity matrix: \mathbf{A}\mathbf{A}^{-1}=\mathbf{I}

If the matrix \mathbf{A} is invertible, it is called non-singular. Otherwise, it is called non-invertible or singular. The concept of a matrix being singular is a relatively more advance concept which we will explain in future posts.

You can practice the calculation of a matrix inverse using the following code:

# Calculate the matrix power for two square and non-square matrices.
# Use Numpy package 
import numpy as np
# A: 3x3 SQUARE matrix using np.array
A = np.random.randint(10, size=(3, 3))
print('A has the shape of: {}'.format(A.shape))
# Caculate matrix inverse
ainv = np.linalg.inv(A)
print('The inverse of A is:',ainv)
# Check to see if multiplication of A and A^{-1} equals to identiy matrix I
if np.allclose(np.dot(A, ainv), np.eye(A.shape[0])): print('A is invertible!')

The following properties apply for matrix inversion operation:

  • (\mathbf{A}\mathbf{B})^{-1}={\mathbf{B}}^{-1}{\mathbf{A}}^{-1}
  • (\mathbf{A}^T)^{-1}=(\mathbf{A}^{-1})^{T}
  • (\alpha\mathbf{A})^{-1}=(1/\alpha) \mathbf{A}^{-1}

You can check the aforementioned properties using the following Python script:

# Calculate the matrix power for two square and non-square matrices.
# Use Numpy package 
import numpy as np
# Define two 3x3 SQUARE matrix using np.array
A = np.random.randint(10, size=(3, 3))
B = np.random.randint(10, size=(3, 3))
# np.round(A,2): rounds the array A to two decimal points
# Check property (AB)^{-1} = B^{-1}.A^{-1}
left_side = np.linalg.inv(np.matmul(A,B))
right_side = np.matmul(np.linalg.inv(B),np.linalg.inv(A))
checkProperty = np.all(np.round(left_side, 2) == np.round(right_side, 2))
if checkProperty: print('Property (AB)^{-1} = B^{-1}.A^{-1} is confirmed!')
# Check property (A^{T})^{-1} = (A^{-1})^{T}
left_side = np.linalg.inv(np.transpose(A))
right_side = np.transpose(np.linalg.inv(A))
checkProperty = np.all(np.round(left_side, 2) == np.round(right_side, 2))
if checkProperty: print('Property (A^{T})^{-1} = (A^{-1})^{T} is confirmed!')
# np.dot is a very important function.
# Ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
# Check property (alpha.A)^{-1} = (1/alpha)A^{-1}
alpha = float(3) # Any scalar (make sure to define a flaot)
left_side = np.linalg.inv(np.dot(alpha,A))
right_side = np.dot(1/alpha,np.linalg.inv(A))
checkProperty = np.all(np.round(left_side, 2) == np.round(right_side, 2))
if checkProperty: print('Property (alpha.A)^{-1} = (1/alpha)A^{-1} is confirmed!')

Matrix Trace

Assume we have a square matrix. Trace of a matrix is an operator that is the sum of its diagonal elements as below:

    \[Tr(\mathbf{M})=\sum_{k} \mathbf{M}_{k,k}\]

Working with the trace of a matrix, instead of the matrix itself, is easier for a lot of matrix calculations and mathematical proofs.

We have the following properties for the matrix trace operator:

  • Tr(\mathbf{M})=Tr(\mathbf{M}^{T})
  • Tr(\mathbf{AB})=Tr(\mathbf{BA}) if both \mathbf{AB} and \mathbf{BA} exists and are valid.

Matrix Determinant

The determinant of the matrix \mathbf{A} \in \mathbb{R}^{N \times N} is a scalar value that we compute from the elements of the matrix. The matrix MUST be square so we can compute the determinant. Furthermore, the matrix must be non-singular!

Special Matrices

There are special kinds of matrices that you may hear their names every day. I wanted to briefly describe some of them here.

Symmetric matrix: A symmetric matrix, equals its transpose as below:

    \[\mathbf{M} = \mathbf{M}^T\]

An example would be the following matrix:

    \[\begin{bmatrix} 1 & -1\\   -1& 3 \end{bmatrix}\]

Diagonal matrix: A diagonal matrix only have non-zero elements on its main diagonal. Everywhere else in the matrix, we have zero elements.

    \[\mathbf{M}_{i,j} = 0, i \neq j\]

In another word:

    \[\mathbf{M}_{i,j} \neq 0, i = j\]

We can have the following example:

    \[\begin{bmatrix} 1 &  0 & 0\\  0 & 3 & 0\\  0 & 0 & -1 \end{bmatrix}\]

NOTE: A diagonal matrix, does not have to be a square matrix. The main diagonal of a matrix is formed with the elements \mathbf{M}_{i,j} when i=j.

Orthogonal matrix: An orthogonal matrix is a square matrix which has the following characteristics:

    \[\mathbf{M}\mathbf{M}^{T}=\mathbf{M}^{T}\mathbf{M}=\mathbf{I}\]

Noted that if we multiply \mathbf{M} with \mathbf{M}^T and it results in the identity matrix, this implies that \mathbf{M}^T=\mathbf{M}^{-1}.

Conclusion

In this tutorial, I explained the important matrix operations that are commonly used in Machine Learning. To understand the definitions better, you can refer to a previously published article, titled Basic Linear Algebra Definitions that You Hear Every Day. First I explain the specific operations. In addition, I showed you how to code it in Python. In conclusion, you made a sense of the practical implementation of the operations in Python in addition to their theoretical interpretation. Hopefully, you found them helpful to gain a better understanding of the concept.

P.S. Please share with me your thoughts by commenting below. I might be wrong in what I say, and I love to know when I am wrong. Furthermore, your questions might be my questions, as well. It’s always good to become better even if being the best is impossible in our belief system. So let’s help each other to become better.

Leave a Comment

Your email address will not be published. Required fields are marked *

Tweet
Share
Pin
Share