In this tutorial, we will introduce how to train and evaluate a Linear Regression model using TensorFlow. Linear Regression is of the fundamental Machine Learning techniques that are frequently used. In this tutorial, you will learn:

the concept of Linear Regression
the particular case of Linear Regression with one variable
a working example using a well-known dataset
how to implement this algorithm in Python and with TensorFlow and Keras
working on the dataset using a powerful library such as Pandas
investigating and visualizing the data

Here, we investigate the Linear regression with one variable in which only one dependent and one independent variable are present. We later in this post, discuss the concept of dependent and independent variables.


An Introduction to Linear Regression

In machine learning and statistics, Linear Regression is categorized as a supervised learning method and aims to model the linear relationship between a variable such as Y and at least one independent variable as X. In the Linear Regression, the linear relationships will be modeled by a predictor function in which its parameters will be estimated by the data and is called a Linear Model. The main advantage of the Linear Regression algorithm is its simplicity.

Linear Regression belongs to the general category of regression analysis which embraces different kinds of algorithms such as Linear Regression, Logistic Regression, Ridge Regression, and Lasso Regression, to name a few. In general, regression analysis is a kind of predictive modeling method that examines the relationship between a dependent (target) and some independent variables (exploratory).

A Dataset for Linear Regression

We conduct our experiments using the Boston house prices dataset as a small suitable dataset which facilitates the experimental settings. The goal of our Linear Regression model is to predict the median value of owner-occupied homes. We can download the data as below:

# Download the daset with keras.utils.get_file
dataset_path = keras.utils.get_file("housing.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")

Becoming Familiar with Data

The characteristics and attributes of the dataset are as below:

Characteristics

  • Number of Instances: 506
  • The first 13 features are numeric/categorical predictive features.
  • The last one (attribute 14): Median Value is the target variable.

Attributes

  1. CRIM: per capita crime rate by town
  2. ZN: the proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS: the proportion of non-retail business acres per town
  4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX: nitric oxides concentration (parts per 10 million)
  6. RM: average number of rooms per dwelling
  7. AGE: the proportion of owner-occupied units built prior to 1940
  8. DIS: weighted distances to five Boston employment centers
  9. RAD: index of accessibility to radial highways
  10. TAX: full-value property-tax rate per $10,000
  11. PTRATIO: pupil-teacher ratio by town
  12. B: 1000 \times (Bk - 0.63)^2 where Bk is the proportion of blacks by town
  13. LSTAT: % lower status of the population
  14. MEDV: Median value of owner-occupied homes in $1000’s [target attribute]

Let’s explore the data. The first step is to show some of the data samples:

column_names = ['CRIM','ZN','INDUS','CHAS','NOX',
                'RM', 'AGE', 'DIS','RAD','TAX','PTRATION', 'B', 'LSTAT', 'MEDV']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)
# Create a dataset instant
dataset = raw_dataset.copy()
# This function returns last n rows from the object 
# based on position.
dataset.tail(n=10)

Using the Pandas library, we created the data frame by assigning columns’ names with attributes (line 1) and created the data object by reading the downloaded dataset (line 3). By demonstrating the last 10 rows of the data (line 11), you should get the following output:

data exploration

Data Processing

We should now split data into train/test splits.

# Split data into train/test
# p = training data portion
p=0.8
trainDataset = dataset.sample(frac=p,random_state=0)
testDataset = dataset.drop(trainDataset.index)

Above, we took a portion of the data (p) for training (line 4) and the remaining samples for testing (line 5). Here, we desire to model the relationship between the dependent variable and the independent variable. In the Linear Regression with one variable, we only have one independent and one dependent variable:

  • Independent variable: ‘RM’ [see attributes]
  • Dependent variable: ‘MEDV’ [see attributes]

In a simple word, we want to predict the Median value of owner-occupied homes (in $1000’s) [target attribute] based on the average number of rooms per dwelling (RM). Let’s plot the MEDV against RM, i.e, visualize how MEDV is changed by changing RM. Basically we have MEDV=f(RM) and we desire to estimate the function f(.) using Linear Regression.

# Visual representation of training data
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# With .pop() command, the associated columns are extracted.
x = trainDataset['RM']
y = trainDataset['MEDV']
ax.scatter(x, y, edgecolors=(0, 0, 0))
ax.set_xlabel('RM')
ax.set_ylabel('MEDV')
plt.show()

From the train-test data, we should extract the data and labels associated with the Linear Regression for one variable experiment. We can use two approaches to access the data columns:

  1. Pop command: It returns an item and drops it from the frame. After using trainDataset.pop(‘RM’), the ‘RM’ column does not exist in the trainDataset frame anymore!
  2. Using the indexing with labels. Example trainDataset[‘RM’]

We use approach (2) as below:

# Pop command return item and drop it from frame.
# After using trainDataset.pop('RM'), the 'RM' column 
# does not exist in the trainDataset frame anymore!
trainInput = trainDataset['RM']
trainTarget = trainDataset['MEDV']
testInput = testDataset['RM']
testTarget = testDataset['MEDV']

Implementation

We assume we have the linear model y= w_1 x+ w_0 in which w_0 and w_1 are two unknown parameters that represent the intercept and slope of the line. In our implementation, we desire to obtain an estimate of this linear model as \hat{y}= \hat{w_1} x+ \hat{w_0}.

For our dataset, we have (x_i,y_i) as pairs of data where x_i and y_i are input and target values, respectively. To train our model, we have the general following schema:

The parameters x, y, and \hat{y} are the input, target (output), and the predicted output by the model, respectively.

Create the Model

The first step is to create the model as follows:

  1. The architecture of the model
  2. Defining the optimizer
  3. Compile the model and return the graph

As mentioned above, we desire to find the parameters (w) that predict the output y from x in a linear fashion:

    \[y= w_1 x+ w_0\]

The above can be defined with the following dense layer:

# We don't specify anything for activation -> no activation is applied (ie. "linear" activation: a(x) = x)
# Check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
model = keras.Sequential([
      layers.Dense(1, use_bias=True, input_shape=(1,))
    ])

This is how the model looks like:

Compiling

To do the model compiling, we should set the following items:

  • Optimizer: We use stochastic gradient descent optimization. TensorFlow is designed to do it for us. We should set an optimizer.
  • Model Compiling schema: In this step, the job is to define (1) how the model is going to behave in terms of optimizing and (2) what criteria it should use for optimization. The criteria for optimization is called loss function which supervises the training. The linear relationship between two variables of (X,Y) is estimated by designing an appropriate optimization problem which its requirement is a proper loss function.

The compiling phase is as below:

# Adam optimizer
optimizer = tf.keras.optimizers.Adam(
    learning_rate=0.01, beta_1=0.9, beta_2=0.99, epsilon=1e-05, amsgrad=False,
    name='Adam')
  
# Model compiling settings
model.compile(loss='mse', optimizer=optimizer, metrics=['mae','mse'])

For model compiling, we used:

  1. Adam as the optimizer which is one of the widely used methods. The parameter amsgrad is set to False and the goal is to implement the basic Adam optimizer [read more].
  2. Mean Square Error (MSE) metric as the loss function for optimization.
  3. Both Mean Square Error (MSE) and Mean Absolute Error (MAE) metric for model evaluation.

What are the MSE and MAE metrics? Assume X and Y are independent and dependent variables, respectively. The goal of the model is to predict Y given X and the model prediction is \hat{Y}. Then, we have the following definitions:

    \[MSE = \frac{1}{\mathcal{N}}\sum_{i=1}^{\mathcal{N}}(\hat{Y_i}-Y_i)^2\]

    \[MAE = \frac{1}{\mathcal{N}}\sum_{i=1}^{\mathcal{N}}|\hat{Y_i}-Y_i|\]

when \mathcal{N} is the number of samples. Since we picked MSE as the loss function, it indicates that the loss function goal is to minimize the squared differences between the real output (Y) and the predicted output (\hat{Y}).

Training

The next step is setting up the actual training phase. For doing so, we have the following parameters:

  • n_epochs: number of epochs
  • batch_size: number of samples per batch as the training is conducted with mini-batch optimization.
  • validation_split: keep a portion of training data for unbiased validation. The validation set is NOT the test set. In the middle of training, instead of only relying on the training set evaluation, we evaluate our model on the validation set as it provides more insightful results about how the model is improving.
  • verbose: set to 0 as we want a short summary and not all the details!!
  • callbacks: A callback is a tool to customize the behavior of the model during training, testing, etc.

Having the above parameters, the training phase is done with TensorFlow as below:

# A mechanism that stops training if the validation loss is not improving for more than n_idle_epochs.
n_idle_epochs = 100
earlyStopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=n_idle_epochs, min_delta=0.01)
# Creating a custom callback to print the log after a certain number of epochs
class NEPOCHLogger(tf.keras.callbacks.Callback):
    def __init__(self,per_epoch=100):
        '''
        display: Number of batches to wait before outputting loss
        '''
        self.seen = 0
        self.per_epoch = per_epoch
    
    def on_epoch_end(self, epoch, logs=None):
      if epoch % self.per_epoch == 0:
        print('Epoch {}, loss {:.2f}, val_loss {:.2f}, mae {:.2f}, val_mae {:.2f}, mse {:.2f}, val_mse {:.2f}'\
              .format(epoch, logs['loss'], logs['val_loss'],logs['mae'], logs['val_mae'],logs['mse'], logs['val_mse']))
        
# Call the object
log_display = NEPOCHLogger(per_epoch=100)
# Training loop
n_epochs = 2000
history = model.fit(
  trainDataOne, trainLabelOne, batch_size=256,
  epochs=n_epochs, validation_split = 0.1, verbose=0, callbacks=[earlyStopping,log_display])

Let’s explain the above code.

  • Lines 1-3: We set the early stopping mechanism. The early stopping refers to the situation that we do not want our training to be continued. Why? Assume we are training a model and evaluating the validation set. After some time, we realize that there is no improvement in the validation loss. In this scenario, what is the point of continuing the training? But the question is how patient we are? We defined the parameter n_idle_epochs which clarifies our patience! If for more than n_idle_epochs epochs, our improvement is less than min_delta=0.01, then the training should be stopped! Check tf.keras.callbacks.EarlyStopping function for further details.
  • Lines 5-20: I created a custom callback mechanism to print the results every 100 epochs. It seems too much for just a custom printing!? Noted that It is a very good practice to work on custom callbacks as they are very useful when you are working with TensorFlow and Keras.
  • Line 23-26: The training loop which training the model for n_epochs = 2000 and uses the model.fit module. The parameter batch_size=256 determines the number of samples for minibatch optimization. The validation_split is a float number in the range [0,1] which is the portion of the training data that will be used as the validation data. The model will NOT use this portion for training!
The TensorFlow Graph.

The fit.model returns a history object (a callback) for each model. This object stores useful information that we desire to extract and visualize. Let’s explore what is inside history:

print('keys:', history.history.keys())

The above code returns the following:

keys: dict_keys(['loss', 'mae', 'mse', 'val_loss', 'val_mae', 'val_mse'])

which are the training and validation losses. Let’s visualize the MAE loss for training and validation with the code below:

import numpy as np
import pandas as pd
import seaborn as sns
# Returning the desired values for plotting and turn to numpy array
mae = np.asarray(history.history['mae'])
val_mae = np.asarray(history.history['val_mae'])
# Creating the data frame
num_values = (len(mae))
values = np.zeros((num_values,2), dtype=float)
values[:,0] = mae
values[:,1] = val_mae
# Using pandas to frame the data
steps = pd.RangeIndex(start=0,stop=num_values)
data = pd.DataFrame(values, steps, columns=["mae", "va-mae"])
# Plotting
sns.set(style="whitegrid")
sns.lineplot(data=data, palette="tab10", linewidth=2.5)

It should return something similar to the below image:

The training loss is better than validation loss after a while. Early stopping did not let the training continues for 2000 epochs!

Evaluation

Once the model is trained, its the time for evaluation. The evaluation code is as follows:

predictions = model.predict(testInput).flatten()
a = plt.axes(aspect='equal')
plt.scatter(predictions, testTarget, edgecolors=(0, 0, 0))
plt.xlabel('True Values')
plt.ylabel('Predictions')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)

The testDataOneis the test data and predictions variable are the model output. We should compare the ground truth data testLabelOne which represents the actual values (Y) which the predicted values (\hat{Y}). This is set at line 3 with a scatter plot. The result is as follows:

Linear Regression model fitting
The scatter plot shows the relationship between true and predicted values of MEDV in the dataset.

As can be observed, the model did its best to fit the data, but it’s not simply that powerful! More tweaking might be necessary to fit the model.

Noted that we used the very simple y=w_1 x + w_0 as the linear model. It’s nice to see how well the line y=w_1 x + w_0 which is our linear model fits the test data. The below figure shows the model improvement.

The testInput and testTarget variables are plotted together and demonstrate the input data (RM) vs the target values (MEDV).

As can be observed above, the model is improving to fit the test data. Noted that we have shown how well the model is working on the test data NOT the training data. The training data is used to find the optimal model but the model should ultimately work for the test data!

Conclusion

In this tutorial, we walked through one of the most basic and important regression analysis methods called Linear Regression. Linear Regression aims to find the dependency of a target variable to one or more independent variables. Here, we investigated the simple Linear Regression, i.e., when the target variable Y is dependent on only one variable X. You learned how to use TensorFlow to train and evaluate a Linear Regression model.

Leave a Comment

Your email address will not be published. Required fields are marked *

Tweet
Share
Pin
Share