In this tutorial, we will introduce how to train and evaluate a Linear Regression model using TensorFlow. Linear Regression is of the fundamental Machine Learning techniques that are frequently used. In this tutorial, you will learn:
Here, we investigate the Linear regression with one variable in which only one dependent and one independent variable are present. We later in this post, discuss the concept of dependent and independent variables.
An Introduction to Linear Regression
In machine learning and statistics, Linear Regression is categorized as a supervised learning method and aims to model the linear relationship between a variable such as Y and at least one independent variable as X. In the Linear Regression, the linear relationships will be modeled by a predictor function in which its parameters will be estimated by the data and is called a Linear Model. The main advantage of the Linear Regression algorithm is its simplicity.

Linear Regression belongs to the general category of regression analysis which embraces different kinds of algorithms such as Linear Regression, Logistic Regression, Ridge Regression, and Lasso Regression, to name a few. In general, regression analysis is a kind of predictive modeling method that examines the relationship between a dependent (target) and some independent variables (exploratory).
A Dataset for Linear Regression

We conduct our experiments using the Boston house prices dataset as a small suitable dataset which facilitates the experimental settings. The goal of our Linear Regression model is to predict the median value of owner-occupied homes. We can download the data as below:
# Download the daset with keras.utils.get_file dataset_path = keras.utils.get_file("housing.data", "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")
Becoming Familiar with Data
The characteristics and attributes of the dataset are as below:
Characteristics
- Number of Instances: 506
- The first 13 features are numeric/categorical predictive features.
- The last one (attribute 14): Median Value is the target variable.
Attributes
- CRIM: per capita crime rate by town
- ZN: the proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS: the proportion of non-retail business acres per town
- CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX: nitric oxides concentration (parts per 10 million)
- RM: average number of rooms per dwelling
- AGE: the proportion of owner-occupied units built prior to 1940
- DIS: weighted distances to five Boston employment centers
- RAD: index of accessibility to radial highways
- TAX: full-value property-tax rate per $10,000
- PTRATIO: pupil-teacher ratio by town
- B:
where
is the proportion of blacks by town
- LSTAT: % lower status of the population
- MEDV: Median value of owner-occupied homes in $1000’s [target attribute]
Let’s explore the data. The first step is to show some of the data samples:
column_names = ['CRIM','ZN','INDUS','CHAS','NOX', 'RM', 'AGE', 'DIS','RAD','TAX','PTRATION', 'B', 'LSTAT', 'MEDV'] raw_dataset = pd.read_csv(dataset_path, names=column_names, na_values = "?", comment='\t', sep=" ", skipinitialspace=True) # Create a dataset instant dataset = raw_dataset.copy() # This function returns last n rows from the object # based on position. dataset.tail(n=10)
Using the Pandas library, we created the data frame by assigning columns’ names with attributes (line 1) and created the data object by reading the downloaded dataset (line 3). By demonstrating the last 10 rows of the data (line 11), you should get the following output:

Data Processing
We should now split data into train/test splits.
# Split data into train/test # p = training data portion p=0.8 trainDataset = dataset.sample(frac=p,random_state=0) testDataset = dataset.drop(trainDataset.index)
Above, we took a portion of the data () for training (line 4) and the remaining samples for testing (line 5). Here, we desire to model the relationship between the dependent variable and the independent variable. In the Linear Regression with one variable, we only have one independent and one dependent variable:
- Independent variable: ‘RM’ [see attributes]
- Dependent variable: ‘MEDV’ [see attributes]
In a simple word, we want to predict the Median value of owner-occupied homes (in $1000’s) [target attribute] based on the average number of rooms per dwelling (RM). Let’s plot the MEDV against RM, i.e, visualize how MEDV is changed by changing RM. Basically we have and we desire to estimate the function
using Linear Regression.
# Visual representation of training data import matplotlib.pyplot as plt fig, ax = plt.subplots() # With .pop() command, the associated columns are extracted. x = trainDataset['RM'] y = trainDataset['MEDV'] ax.scatter(x, y, edgecolors=(0, 0, 0)) ax.set_xlabel('RM') ax.set_ylabel('MEDV') plt.show()

From the train-test data, we should extract the data and labels associated with the Linear Regression for one variable experiment. We can use two approaches to access the data columns:
- Pop command: It returns an item and drops it from the frame. After using trainDataset.pop(‘RM’), the ‘RM’ column does not exist in the
trainDataset
frame anymore! - Using the indexing with labels. Example trainDataset[‘RM’]
We use approach (2) as below:
# Pop command return item and drop it from frame. # After using trainDataset.pop('RM'), the 'RM' column # does not exist in the trainDataset frame anymore! trainInput = trainDataset['RM'] trainTarget = trainDataset['MEDV'] testInput = testDataset['RM'] testTarget = testDataset['MEDV']
Implementation
We assume we have the linear model in which
and
are two unknown parameters that represent the intercept and slope of the line. In our implementation, we desire to obtain an estimate of this linear model as
.
For our dataset, we have as pairs of data where
and
are input and target values, respectively. To train our model, we have the general following schema:




Create the Model
The first step is to create the model as follows:
- The architecture of the model
- Defining the optimizer
- Compile the model and return the graph
As mentioned above, we desire to find the parameters (w) that predict the output from
in a linear fashion:
The above can be defined with the following dense layer:
# We don't specify anything for activation -> no activation is applied (ie. "linear" activation: a(x) = x) # Check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense model = keras.Sequential([ layers.Dense(1, use_bias=True, input_shape=(1,)) ])
This is how the model looks like:

Compiling
To do the model compiling, we should set the following items:
- Optimizer: We use stochastic gradient descent optimization. TensorFlow is designed to do it for us. We should set an optimizer.
- Model Compiling schema: In this step, the job is to define (1) how the model is going to behave in terms of optimizing and (2) what criteria it should use for optimization. The criteria for optimization is called loss function which supervises the training. The linear relationship between two variables of (
,
) is estimated by designing an appropriate optimization problem which its requirement is a proper loss function.
The compiling phase is as below:
# Adam optimizer optimizer = tf.keras.optimizers.Adam( learning_rate=0.01, beta_1=0.9, beta_2=0.99, epsilon=1e-05, amsgrad=False, name='Adam') # Model compiling settings model.compile(loss='mse', optimizer=optimizer, metrics=['mae','mse'])
For model compiling, we used:
- Adam as the optimizer which is one of the widely used methods. The parameter amsgrad is set to False and the goal is to implement the basic Adam optimizer [read more].
- Mean Square Error (MSE) metric as the loss function for optimization.
- Both Mean Square Error (MSE) and Mean Absolute Error (MAE) metric for model evaluation.
What are the MSE and MAE metrics? Assume and
are independent and dependent variables, respectively. The goal of the model is to predict
given
and the model prediction is
. Then, we have the following definitions:
when is the number of samples. Since we picked MSE as the loss function, it indicates that the loss function goal is to minimize the squared differences between the real output (
) and the predicted output (
).
Training
The next step is setting up the actual training phase. For doing so, we have the following parameters:
- n_epochs: number of epochs
- batch_size: number of samples per batch as the training is conducted with mini-batch optimization.
- validation_split: keep a portion of training data for unbiased validation. The validation set is NOT the test set. In the middle of training, instead of only relying on the training set evaluation, we evaluate our model on the validation set as it provides more insightful results about how the model is improving.
- verbose: set to 0 as we want a short summary and not all the details!!
- callbacks: A callback is a tool to customize the behavior of the model during training, testing, etc.
Having the above parameters, the training phase is done with TensorFlow as below:
# A mechanism that stops training if the validation loss is not improving for more than n_idle_epochs. n_idle_epochs = 100 earlyStopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=n_idle_epochs, min_delta=0.01) # Creating a custom callback to print the log after a certain number of epochs class NEPOCHLogger(tf.keras.callbacks.Callback): def __init__(self,per_epoch=100): ''' display: Number of batches to wait before outputting loss ''' self.seen = 0 self.per_epoch = per_epoch def on_epoch_end(self, epoch, logs=None): if epoch % self.per_epoch == 0: print('Epoch {}, loss {:.2f}, val_loss {:.2f}, mae {:.2f}, val_mae {:.2f}, mse {:.2f}, val_mse {:.2f}'\ .format(epoch, logs['loss'], logs['val_loss'],logs['mae'], logs['val_mae'],logs['mse'], logs['val_mse'])) # Call the object log_display = NEPOCHLogger(per_epoch=100) # Training loop n_epochs = 2000 history = model.fit( trainDataOne, trainLabelOne, batch_size=256, epochs=n_epochs, validation_split = 0.1, verbose=0, callbacks=[earlyStopping,log_display])
Let’s explain the above code.
- Lines 1-3: We set the early stopping mechanism. The early stopping refers to the situation that we do not want our training to be continued. Why? Assume we are training a model and evaluating the validation set. After some time, we realize that there is no improvement in the validation loss. In this scenario, what is the point of continuing the training? But the question is how patient we are? We defined the parameter
n_idle_epochs
which clarifies our patience! If for more than n_idle_epochs epochs, our improvement is less thanmin_delta
=0.01, then the training should be stopped! Check tf.keras.callbacks.EarlyStopping function for further details. - Lines 5-20: I created a custom callback mechanism to print the results every 100 epochs. It seems too much for just a custom printing!? Noted that It is a very good practice to work on custom callbacks as they are very useful when you are working with TensorFlow and Keras.
- Line 23-26: The training loop which training the model for
n_epochs
= 2000 and uses the model.fit module. The parameterbatch_size
=256 determines the number of samples for minibatch optimization. Thevalidation_split
is a float number in the range [0,1] which is the portion of the training data that will be used as the validation data. The model will NOT use this portion for training!

The fit.model
returns a history object (a callback) for each model. This object stores useful information that we desire to extract and visualize. Let’s explore what is inside history
:
print('keys:', history.history.keys())
The above code returns the following:
keys: dict_keys(['loss', 'mae', 'mse', 'val_loss', 'val_mae', 'val_mse'])
which are the training and validation losses. Let’s visualize the MAE loss for training and validation with the code below:
import numpy as np import pandas as pd import seaborn as sns # Returning the desired values for plotting and turn to numpy array mae = np.asarray(history.history['mae']) val_mae = np.asarray(history.history['val_mae']) # Creating the data frame num_values = (len(mae)) values = np.zeros((num_values,2), dtype=float) values[:,0] = mae values[:,1] = val_mae # Using pandas to frame the data steps = pd.RangeIndex(start=0,stop=num_values) data = pd.DataFrame(values, steps, columns=["mae", "va-mae"]) # Plotting sns.set(style="whitegrid") sns.lineplot(data=data, palette="tab10", linewidth=2.5)
It should return something similar to the below image:

Evaluation
Once the model is trained, its the time for evaluation. The evaluation code is as follows:
predictions = model.predict(testInput).flatten() a = plt.axes(aspect='equal') plt.scatter(predictions, testTarget, edgecolors=(0, 0, 0)) plt.xlabel('True Values') plt.ylabel('Predictions') lims = [0, 50] plt.xlim(lims) plt.ylim(lims) _ = plt.plot(lims, lims)
The testDataOne
is the test data and predictions
variable are the model output. We should compare the ground truth data testLabelOne
which represents the actual values () which the predicted values (
). This is set at line 3 with a scatter plot. The result is as follows:

As can be observed, the model did its best to fit the data, but it’s not simply that powerful! More tweaking might be necessary to fit the model.
Noted that we used the very simple as the linear model. It’s nice to see how well the line
which is our linear model fits the test data. The below figure shows the model improvement.

testInput
and testTarget
variables are plotted together and demonstrate the input data (RM) vs the target values (MEDV).As can be observed above, the model is improving to fit the test data. Noted that we have shown how well the model is working on the test data NOT the training data. The training data is used to find the optimal model but the model should ultimately work for the test data!
Conclusion
In this tutorial, we walked through one of the most basic and important regression analysis methods called Linear Regression. Linear Regression aims to find the dependency of a target variable to one or more independent variables. Here, we investigated the simple Linear Regression, i.e., when the target variable is dependent on only one variable
. You learned how to use TensorFlow to train and evaluate a Linear Regression model.