This article is dedicated to Numpy, one of the most important scientific computing libraries for Linear Algebra and henceforth, Machine Learning!

In almost any Python program code in Machine Learning, you see “Numpy” library being used! Why? Doing Machine Learning is impossible without Linear Algebra and Linear Algebra is formed by vectors, matrices, etc. Well, Numpy is one of the best scientific computing packages for Linear Algebra! So that was the short answer to the “why” question earlier!

You can conduct a simple 30 minutes research to find out if I am right or wrong! Search through the web and look for Machine Learning projects and their available source codes. Check to see how many of them use Numpy. Check Python codes. See how many times you see the expression “import numpy” or “import numpy as np” on top of the code. Do it! It definitely motivates you unless you already are aware of the importance of Numpy.

In this article, I am going to talk about Numpy, one of the most important scientific computing libraries. Here are what you learn in this tutorial:

  • What is Numpy?
  • How we can define arrays in Numpy?
  • What are the basic operations?
  • How to leverage Numpy for Machine Learning?
  • What are the most used tips and tricks in using Numpy?

Before You Move On

You may find the following resources helpful to better understand the concept of this article:

Data Types

Here, I describe some of the most important Numpy numeric data types that you frequently encounter. Numpy supports numerous data types, perhaps even more than Python itself!

Basic Data Types

The most basic data types are as follows: integer (ex: 1, 2 , -10), float (ex: 1.1, 3.24 , -7.00111), complex (ex: 1+2j, 2.1 +3j , -2+1.4j), and boolean (ex: True, False). You can use Numpy to convert Python elements in many different ways such as changing an array type to another specific type. Let’s start with the following examples:

# Import Numpy library
import numpy as np
# Define a number
a = 10
print('Type, before converting: ', type(a))
# Change the type to float with Numpy
b = np.float64(a)
print('Type, after converting: ', type(b))
# Change the type to float with Numpy
c = float(a)
print('Type, after converting: ', type(c))

Type, before converting:  <class 'int'>
Type, after converting:  <class 'numpy.float64'>
Type, after converting:  <class 'float'>

Question: What is the difference between using np.float64 and the Python float built-in function?

Type Conversion

Now, let’s turn a list to a Numpy array. For now, let’s just focus on the data types and conversion, later in this article, I will explain how to define Numpy arrays in details.

# Import Numpy library
import numpy as np
# Define a Python list
a = [1, 2, 1.3]
print('Type "a": ', type(a))
# Turn Python list into a Numpy array of type np.int32 (Integer (-2147483648 to 2147483647))
b = np.int32(a)
print('Array "b": ', b)
print('Type array "b": ', type(b))
print('Array "b" data type: ', b.dtype)
# Turn Python list into a Numpy array of type np.float32 (same as Python float)
c = np.float32(a)
print('Array "c": ', c)
print('Type array "c": ', type(c))
print('Array "c" data type: ', c.dtype)

Type "a":  <class 'list'>
Array "b":  [1 2 1]
Type array "b":  <class 'numpy.ndarray'>
Array "b" data type:  int32
Array "c":  [1.  2.  1.3]
Type array "c":  <class 'numpy.ndarray'>
Array "c" data type:  float32

We used the .dtype Numpy method to realize what is the data type inside the array. The recommended way to change the type of a Numpy array is the usage of .astype() method. Take a look at the following example:

# Import Numpy library
import numpy as np
# Define a Python list
a = [1, 2, 4]
print('Type "a": ', type(a))
# Turn Python list into a Numpy array of type np.int32 (Integer (-2147483648 to 2147483647))
b = np.int32(a)
print('Array "b": ', b)
print('Type array "b": ', type(b))
# Turn Python list into a Numpy array of type np.float32 (same as Python float)
c = b.astype(np.float32)
print('Array "c": ', c)
print('Type array "c": ', type(c))

Type "a":  <class 'list'>
Array "b":  [1 2 4]
Type array "b":  <class 'numpy.ndarray'>
Array "c":  [1. 2. 4.]
Type array "c":  <class 'numpy.ndarray'>

Defining a Numpy Array

An array is basically a one- or multi-dimensional grid of values. In a Numpy array, in particular, all values are from the same type (integer, float). How we are going to define a Numpy array? For a Numpy array, we have the following definitions:

  • Rank: The number of dimensions an array has.
  • Shape: A tuple that indicates the number of elements in each dimension. Ex: The shape of an array being as (2,4,10) indicates that we have a three-dimensional array which has 2,4, and 10 elements in the first, second, and third dimension, respectively.

Create an array from a list

Now let’s get started with Python. We start with the most common approach. Let’s define create a Numpy array from a list:

# Import Numpy library
import numpy as np
# Define a Python list
mylist = [1, 2, 4, 8]
# Create a Numpy array from the list
numpy_array = np.array(mylist)
print('Array: ', numpy_array)

Array:  [1 2 4 8]
Above, we used np.array function to transform a list into a Numpy array.

What if we do not define a list and just input the numbers as below:

# Import Numpy library
import numpy as np
# Naively input the numbers
numpy_array = np.array(1,2,4,8)
print('Array: ', numpy_array)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-6d0f836d151e> in <module>()
      2 
      3 # Naively input the numbers
----> 4 numpy_array = np.array(1,2,4,8)
      5 print('Array: ', numpy_array)
ValueError: only 2 non-keyword arguments accepted

As you can see above, Python complains!!

Now, let’s define a two-dimensional array:

# Import Numpy library
import numpy as np
# Naively input the numbers
row1 = [2,4,6,8]
row2 = [1,3,5,7]
numpy_array = np.array([row1,row2])
print('Array: ', numpy_array)
# Get the shape
print('Shape: ', numpy_array.shape)

Array:  [[2 4 6 8]
 [1 3 5 7]]
Shape:  (2, 4)

Let’s take a look at the above code once again. We defined a matrix. The argument inside np.array is a list that each of its elements is another list (see figure above)! The inside lists denoted as row1 and row2 forms the rows of the matrix and MUST have the same size! Think why? That was a simple example to showcase how we can create arrays. I used .shape method to return the Numpy array shape. The output above shows we have a matrix with two rows and four columns.

Special functions

We can create Numpy arrays using some special Numpy functions. I used a couple of them below for your reference:

# Import Numpy library
import numpy as np
### Defining arrays using special functions ###
# Arguments:
#   shape: The shape of the numpy array
#   dtype: Specifying the data type (not required) 
# Defining an all-zero array
zeroArray = np.zeros(shape=(3,5), dtype=np.int16 )
print("zeroArray: ", zeroArray)
# Defining an all-one array
onesArray = np.ones(shape=(3,5), dtype=np.float32 )
print("onesArray: ", onesArray)
# Defining an array filled with one specific elements
fullArray = np.full(shape=(3,5), fill_value=4.2, dtype=np.float64 )
print("fullArray: ", fullArray)

zeroArray:  [[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
onesArray:  [[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
fullArray:  [[4.2 4.2 4.2 4.2 4.2]
 [4.2 4.2 4.2 4.2 4.2]
 [4.2 4.2 4.2 4.2 4.2]]

One of the most important special functions is np.arange which is similar to Python range built-in function. An example of using np.arrage is as below:

# Import Numpy library
import numpy as np
# Define a Numpy array using np.arrage
# np.arrage defines an interval of numbers
# Arguments:
#   start: Starting of the interval.  
#   stop: Ending of the interval.
#   step: Step size.
# NOTE: The interval includes "start" number but does NOT include "stop" number.
arr = np.arange(start=3,stop=10,step=2)
print("Array: ", arr)

Array:  [3 5 7 9]
Numpy indexing starts from 0 to infinity! By defining np.arange(start=3,stop=10,step=2), start index is 3. The stop index is 10 BUT it is NOT included in the range. Step size is 2 means every other index is picked, i.e., we only pick green indexes and jump over orange ones!!

Universal functions

NumPy, for performing element-wise operations, provides some universal functions. Take a look at some of the universal functions demonstrated below:

# Import Numpy library
import numpy as np
v = np.array([1, 3, 5])
w = np.array([2, 4, 6])
# Exponential opration
print('e^v= ', np.exp(v))
# The squere root of an array
print('v^{1/2}= ', np.sqrt(v))
# Adding two vectors
print('v+w= ', np.add(v, w))
# Sin and Cos of an array
print('Sin(v)= ', np.sin(v))
print('Cos(v)= ', np.cos(v))

Random Array

Sometimes, it is desired to define an array with random numbers. There are many ways to this task depends on the kind of random numbers we want to use. For example, do we want to generate random integers, float, etc? Take a look at the below approaches:

# Import Numpy library
import numpy as np
### Defining arrays with random elements ###
# Arguments:
#   size: The size of the numpy array
# Defining a random array
randArray = np.random.random(size=(2,3))
print("randArray: ", randArray)
# Defining a random array with integer elements
# high: The highest element that is allowed to be generated is ``high-1``
# low: The lowest integer that is allowed to be generated
randintArray = np.random.randint(low=1, high=5, size=(2,3))
print("randintArray: ", randintArray)

randArray:  [[0.65055144 0.12510875 0.09906024]
 [0.47333278 0.67082837 0.7673982 ]]
randintArray:  [[4 4 1]
 [1 2 2]]

CAVEAT: As we generated a random vector, if you use the above Python code, you definitely will NOT get the results I reported above! It was obvious, though. Right?

Basic Operations

Basic arithmetic operation

Let’s first cover the basic arithmetic operations with some examples:

# Import Numpy library
import numpy as np
# Create two arrays
a = np.array([[1,5],[0,8]], dtype=np.float32)
b = np.array([[2,1],[5,2]], dtype=np.float32)
print('a= ', a)
print('b= ', b)
# Elementwise adding
# a + b and np.add(a, b) are the same.
# Check to see if a + b and np.add(a, b) are the same using ``assert``.
# .all() method check if all elements of a matrix is True.
print('a + b= ', a + b)
assert((a + b == np.add(a, b)).all())
# Elementwise subtraction
print('a - b= ', a - b)
assert((a - b == np.subtract(a, b)).all())
# Elementwise multiplication
print('a * b= ', a * b)
assert((a * b == np.multiply(a, b)).all())
# Elementwise division
print('a / b= ',a / b)
assert((a / b == np.divide(a, b)).all())
# Elementwise square
print('a^2= ', np.square(a))
assert((a ** 2 == np.square(a)).all())

a=  [[1. 5.]
 [0. 8.]]
b=  [[2. 1.]
 [5. 2.]]
a + b=  [[ 3.  6.]
 [ 5. 10.]]
a - b=  [[-1.  4.]
 [-5.  6.]]
a * b=  [[ 2.  5.]
 [ 0. 16.]]
a / b=  [[0.5 5. ]
 [0.  4. ]]
a^2=  [[ 1. 25.]
 [ 0. 64.]]
Element-wise summation. In element-wise operations, the operator performs on correspondent elements from matrices.

Define a Vector – Linear Algebra

Let’s talk about how we define a vector with Numpy. Assume we would like to define a column vector with has a size of k \times 1. Take a careful look to the code below and the shape of the arrays:

# Import Numpy library
import numpy as np
# Rank-1 array
v = np.array([0,8])
print('Shape: ', v.shape)
# Rank-2 array (row vector)
v = np.array([[0,8]])
print('Shape: ', v.shape)
# Rank-2 array (column vector)
v = np.array([[0],[8]])
print('Shape: ', v.shape)

In the above code, we defined the same arrays in terms of numeric values with different ranks and shape. At line 7, we defined a rank-1 array (has only one dimension). At 11, we defined a rank-2 array which is a row vector (1 row and multiple columns). For defining vectors, the preference is how we did at line 15 which results in a rank-2 array and a column vector (multiple rows and 1 column). The output of the above code is as below:

Shape:  (2,)
Shape:  (1, 2)
Shape:  (2, 1)

Remember we do NOT usually need to define vectors as we did in lines 11 or 15. That approach seemed to be a little bit complicated using all those sorts of nested Python lists! Now let’s do it the easy way:

import numpy as np
# Rank-1 array
v = np.array([0,8])
print('Shape: ', v.shape)
# Rank-1 array (row vector)
row_v = v.reshape(1,-1)
print('Shape: ', row_v.shape)
# Rank-1 array (column vector)
column_v = v.reshape(-1,1)
print('Shape: ', column_v.shape)

What I did above? (1) I used “-1” as it indicates all rows (columns). (2) I used the Numpy “reshape” method which simply changes the shape of the array to the desired shape (details later in this tutorial). (3) I used “1” indexing which indicates one!

Let me explain the line 8 of the above code for further illustration. (1) “-1” is the total columns which are the total elements of the vector \mathbf{v}, equals 2. (2) Numpy “reshape” method changes the \mathbf{v} shape to (1,2) which means the new vector (row_v) has 2 columns and only one row! It is worth to emphasize row_v is a row vector as it only has one row.

NOTE: In simple words, (1) (1,-1) means put only one row and place all elements in columns and (-1,1) means put only one column and place all elements in rows. Check the below figure.

Matrix/Vector Operations – Linear Algebra

This section is dedicated to what we may mostly use in Machine Learning. Operations on vectors and matrices. Let’s take a look:

# Import Numpy library
import numpy as np
# Create two vectors and two matrices
v = np.array([0,8]).reshape(-1,1)
u = np.array([1,4]).reshape(-1,1)
A = np.array([[2,1],[5,2]])
B = np.array([[2,1],[5,2]])
# Dot porduct of two vectors with two approaches
print('v.u = ', v.dot(u.transpose()))
print('v.u = ', np.dot(v, u.transpose()))
# Porduct of a vector with a matrix
print('A.v = ', A.dot(v))
print('A.v = ', np.dot(A, v))
# Matrix product with three approaches
print('A.B = ', A.dot(B))
print('A.B = ', np.dot(A, B))
print('A.B = ', np.matmul(A,B))

Let’s do a practice. Run the above code and answer the following questions:

  • What is the shape and rank of \mathbf{v} and \mathbf{u}?
  • In lines 11 and 12, did we have to use “.transpose()”? Why?
  • Instead of calculating ‘v.u’ how would you calculate ‘u.v’?
  • Take a look at lines 15 and 16. Instead of ‘A.v’, can we calculate ‘v.A’?
You may see the visual answer to some of the questions above! Dimension matching is crucial in multiplying vectors and matrices.

I have used np.matmul, in one of the previous posts. Now, let’s discussed the frequently used operations that we use in Machine Learning: Sum and mean over a matrix, or along with a specific dimension:

# Import Numpy library
import numpy as np
# Create a matrix
A = np.array([[2,1,3,4],[5,2,9,4]])
print('A=', A)
# Sum and mean over the matrix
print('sum(A) = ', np.sum(A))
print('mean(A) ', np.mean(A))
# Sum and mean over axiz zero (rows)
print('Sum over rows = ', np.sum(A, axis=0))
print('Mean over rows = ', np.mean(A, axis=0))
# Sum and mean over axiz one (colums)
print('Sum over columns = ', np.sum(A, axis=1))
print('Mean over columns = ', np.mean(A, axis=1))

A= [[2 1 3 4]
 [5 2 9 4]]
sum(A) =  30
mean(A)  3.75
Sum over rows =  [ 7  3 12  8]
Mean over rows =  [3.5 1.5 6.  4. ]
Sum over columns =  [10 20]
Mean over columns =  [2.5 5. ]

NOTE: When we take the sum/mean over a specific axis, the result is an array in which that dimension is squeezed to one dimension. The example above shows if we take the sum/mean over the dimension zero (one), the resulting array has only one row (column), and the number of columns (rows) is equal to the number of columns (rows) in the main matrix \mathbf{A}.

Array Manipulation

After becoming familiar with Numpy arrays, now it’s time to learn how to play with arrays.

Indexing

First, let’s define and slice an array.

# Import Numpy library
import numpy as np
# Create a matrix
A = np.array([[2,1,3,4],[5,2,9,4],[5,2,10,1],[2,2,11,-1]])
print('A=\n', A)
# Extract the first two rows
# Remember 0:2 in indexing means {0,1} and does NOT include 2!
# Using : merely, means ALL!
print('The first two rows= \n', A[0:2,:])
# Extract the first three rows and the last two columns
# -2: means the second to the last to the end!
# 0:3 mean {0,1,2}
print('The first three rows and last two columns= \n', A[0:3,-2:])
# Let's point to one element
# The second row (index 1) and third column (index 2)
# Remember Python indexing starts from zero!!
print('The second row and third column= \n', A[1,2])

A=
 [[ 2  1  3  4]
 [ 5  2  9  4]
 [ 5  2 10  1]
 [ 2  2 11 -1]]
The first two rows= 
 [[2 1 3 4]
 [5 2 9 4]]
The first three rows and last two columns= 
 [[ 3  4]
 [ 9  4]
 [10  1]]
The second row and third column= 
 9
Left figure: Shows the A[0:4,-2:] which is the first three rows and the last two columns. Right figure: Shows A[1,2] which is only one single element located in second row and third column.

Let’s review the code above. At line 11, I used sliced indexing by selecting from a range of indices. At line 21, I only used integer indices. We can simply combine both, but there might be a difference in the output matrix ranking. Check the example below:

# Import Numpy library
import numpy as np
# Create a matrix
A = np.array([[2,1,3,4],[5,2,9,4],[5,2,10,1],[2,2,11,-1]])
print('A=\n', A)
# Extract the first row and all colums using two approachs
print('The first row with slice indexing= \n', A[0:1,:]) # Slice indexing
print('The first row with integer indexing= \n', A[0,:])  # Integer indexing
print('The first row shape slice indexing= \n', A[0:1,:].shape) # Slice indexing
print('The first row shape integer indexing= \n', A[0,:].shape)  # Integer indexing

A=
 [[ 2  1  3  4]
 [ 5  2  9  4]
 [ 5  2 10  1]
 [ 2  2 11 -1]]
The first row with slice indexing= 
 [[2 1 3 4]]
The first row with integer indexing= 
 [2 1 3 4]
The first row shape slice indexing= 
 (1, 4)
The first row shape integer indexing= 
 (4,)

If you see the results, the shape of the matrix would be different. Basically, with slice indexing, we have a rank-2 matrix and with integer indexing, we will have a rank-1 matrix. Be careful about this difference when you are dealing with Numpy indexing.

Shaping

# Import Numpy library
import numpy as np
# Create a matrix
A = np.array([[2,1,3,4],[5,2,9,4],[5,2,10,1]])
print('A=\n', A)
print('Shape of A=\n', A.shape)
# Reshape A to the new shape of (2,6)
B = A.reshape(2,6)
print("B: \n", B)
# Reshape A to the new shape of (2,x)
# If we use -1, the remaining dimension will be chosen automatically.
C = A.reshape(4,-1)
print("C: \n", C)
# Flatten operation
print("Flatten A: \n", A.ravel())

A=
 [[ 2  1  3  4]
 [ 5  2  9  4]
 [ 5  2 10  1]]
Shape of A=
 (3, 4)
B: 
 [[ 2  1  3  4  5  2]
 [ 9  4  5  2 10  1]]
C: 
 [[ 2  1  3]
 [ 4  5  2]
 [ 9  4  5]
 [ 2 10  1]]
Flatten A: 
 [ 2  1  3  4  5  2  9  4  5  2 10  1]

The question is how reshaping operations work? Above we had the matrix \mathbf{A} of size (3,4) with 12 (3 \times 4) total elements. When we use np.reshape, the default Numpy order is “C-style”, which is, the rightmost index “changes the fastest” for the processing operation. Let’s use the above example of using .ravel() to flatten the matrix: The first element is obviously \mathbf{A}_{0,0} and the next one is \mathbf{A}_{0,1}. The processing and creating the new array is as below when using .ravel():

    \[[ \mathbf{A}_{0,0}, \mathbf{A}_{0,1}, \mathbf{A}_{0,2}, \mathbf{A}_{0,3},  \mathbf{A}_{1,0}, \cdots, \mathbf{A}_{3,3}, \mathbf{A}_{3,4} ]\]

Conclusion

In this tutorial, you learned how to use Numpy. You also realized how important it is for Machine Learning purposes. But clearly this tutorial is NOT a panacea (a cure for everything!) for all your Numpy needs nor it is flawless! You always need to explore more. You can learn more efficiently if you practice on your own. Use the above codes as a starter code and try to play around with them. Feel free to ask questions and share your comments as I am sure it can help you, myself and every other reader to learn more. This tutorial is subject to change and I would be happy to have your suggestions for doing so.

Leave a Comment

Your email address will not be published. Required fields are marked *

Tweet
Share
Pin
Share