Generative models are everywhere. Imagine you speak in front of a microphone, and your sound will be transformed into text! Imagine you talk to a chatbot, and the chatbot wants to answer your question. To create a powerful generative model, the machine must always understand data, whether in a supervised or unsupervised way. It’s non-negotiable! The generative model needs first to understand the data and then create a robust understanding called data representation in the AI world.

There are tons of generative models out there. From simple statistical models to Generative Adversarial Networks (GANs). In the past couple of years, GANs attracted a lot of attention. However, they have different challenges that are hard to overcome. Another type of generative models is autoencoders which have many applications from data compression to pure generative purposes. The advantage of autoencoders is their interpretability and flexibility.

In this blog post, I am going to explain what are autoencoders and together we touch base on some simple concepts here.

Generative Models and The Importance of Data Representation

Researching and developing generative models is not really new. Of course, it’s because we can divide Machine Learning (ML) applications into two general categories: (1) discriminative and (2) generative. So I probably do not need to lecture you about why generative models are important. Right? So to narrow down the scope of what we are going to discuss here, let’s switch gears to data representation.

In ML, we usually want to represent the data distribution with a (1) meaningful and (2) compact representation. So now we are going to do that? The most commonly heard approach is to use the classical unsupervised algorithm “PCA”. Basically, PCA tries to represent the most important information in the data with a more compact representation compared to the raw data. You can read more about it in this great blog post. However, the PCA-generated representation has a linear relationship with the data. What if we wanted a more complex and more powerful representation? Common sense is that we should seek to find a non-linear relationship. That’s where autoencoders come to our rescue. Shortly, I am going to discuss what they are.


Basically, an autoencoder is a neural network architecture that consists of two-component: an encoder and a decoder. An autoencoder:

  1. takes an input x,
  2. generates a latent space (hidden space) h=f(x),
  3. and reconstructions the input from its hidden space by \hat{x} = g(h)=g(f(x)).

Note: The layer in-between is called latent space because it is latent!!

Ok now that we know what an autoencoder does, we can do that and create a latent space which supposed to represent the data in a meaningful and compact way. Right? If it was so simple, our lives would never be hard!! To further explain what happens next, I need to explain Undercomplete Autoencoders which are the simplest form of autoencoders.

Undercomplete Autoencoder

In an undercomplete autoencoder, we simply try to minimize the following loss term:

    \[\ell(x, \hat{x}), \hat{x} = g(f(x))\]

The \ell loss function is usually the mean square error between x and its reconstructed counterpart \hat{x}. They are a couple of notes about undercomplete autoencoders:

  1. The loss term is pretty simple and easy to optimize.
  2. The training would be straightforward and we usually do not get stuck in local miminas!
  3. The loss function can easily be enforced on a structure.
  4. If both encoder and decoder are linear, we basically built a PCA!

However, there are BIG issues with undercomplete autoencoders:

  1. The training will be overfitted easily.
  2. The training can be successful but who knows if the autoencoder learned any meaningful information? In fact, the chances of no useful information being learned is really high!!

In autoencoders, we usually do not care about the output of the decoder. The hidden space is very important though. The undercomplete autoencoder does not optimize much in the hidden space. It only cares about the reconstruction.

So what’s the solution? There are many different autoencoder types were proposed to resolve this issue. In future posts, we will discuss them and their pros and cons.


In this post, you learned about autoencoders in general and their goals. What is important is to know why we care about compressed data representation? Why do we care about the hidden space? And why autoencoders are coming to our rescue. I hope you enjoyed it.