What is a Boltzmann machine?
A Boltzmann machine is an unsupervised deep learning model in which every node is connected to every other node. It is a type of recurrent neural network, and the nodes make binary decisions with some level of bias.
These machines are not deterministic deep learning models, they are stochastic or generative deep learning models. They are representations of a system.
A Boltzmann machine has two kinds of nodes
- Visible nodes:
These are nodes that can be measured and are measured. - Hidden nodes:
These are nodes that cannot be measured or are not measured.
According to some experts, a Boltzmann machine can be called a stochastic Hopfield network which has hidden units. It has a network of units with an ‘energy’ defined for the overall network.
Boltzmann machines seek to reach thermal equilibrium. It essentially looks to optimize global distribution of energy. But the temperature and energy of the system are relative to laws of thermodynamics and are not literal.
A Boltzmann machine is made up of a learning algorithm that enables it to discover interesting features in datasets composed of binary vectors. The learning algorithm tends to be slow in networks that have many layers of feature detectors but it is possible to make it faster by implementing a learning layer of feature detectors.
They use stochastic binary units to reach probability distribution equilibrium (to minimize energy). It is possible to get multiple Boltzmann machines to collaborate together to form far more sophisticated systems like deep belief networks.
The Boltzmann machine is named after Ludwig Boltzmann, an Austrian scientist who came up with the Boltzmann distribution. However, this type of network was first developed by Geoff Hinton, a Stanford Scientist.
What is the Boltzmann distribution?
The Boltzmann distribution is a probability distribution that gives the probability of a system being in a certain state as a function of that state's energy and the temperature of the system.
It was formulated by Ludwig Boltzmann in 1868 and is also known as the Gibbs distribution.
What are Boltzmann machines used for?
The main aim of a Boltzmann machine is to optimize the solution of a problem. To do this, it optimizes the weights and quantities related to the specific problem that is assigned to it. This technique is employed when the main aim is to create mapping and to learn from the attributes and target variables in the data. If you seek to identify an underlying structure or the pattern within the data, unsupervised learning methods for this model are regarded to be more useful. Some of the most widely used unsupervised learning methods are clustering, dimensionality reduction, anomaly detection and creating generative models.
All of these techniques have a different objective of detecting patterns like identifying latent grouping, finding irregularities in the data, or even generating new samples from the data that is available. You can even stack these networks in layers to build deep neural networks that capture highly complicated statistics. Restricted Boltzmann machines are widely used in the domain of imaging and image processing as well because they have the ability to model continuous data that are common to natural images. They are even used to solve complicated quantum mechanical many-particle problems or classical statistical physics problems like the Ising and Potts classes of models.
How does a Boltzmann machine work?
Boltzmann machines are non-deterministic (stochastic) generative Deep Learning models that only have two kinds of nodes - hidden and visible nodes. They don’t have any output nodes, and that’s what gives them the non-deterministic feature. They learn patterns without the typical 1 or 0 type output through which patterns are learned and optimized using Stochastic Gradient Descent.
A major difference is that unlike other traditional networks (A/C/R) which don’t have any connections between the input nodes, Boltzmann Machines have connections among the input nodes. Every node is connected to all other nodes irrespective of whether they are input or hidden nodes. This enables them to share information among themselves and self-generate subsequent data. You’d only measure what’s on the visible nodes and not what’s on the hidden nodes. After the input is provided, the Boltzmann machines are able to capture all the parameters, patterns and correlations among the data. It is because of this that they are known as deep generative models and they fall into the class of Unsupervised Deep Learning.
What are the types of Boltzmann machines?
There are three types of Boltzmann machines. These are:
- Restricted Boltzmann Machines (RBMs)
- Deep Belief Networks (DBNs)
- Deep Boltzmann Machines (DBMs)
1. Restricted Boltzmann Machines (RBMs)
While in a full Boltzmann machine all the nodes are connected to each other and the connections grow exponentially, an RBM has certain restrictions with respect to node connections.
In a Restricted Boltzmann Machine, hidden nodes cannot be connected to each other while visible nodes are connected to each other.
2. Deep Belief Networks (DBNs)
In a Deep Belief Network, you could say that multiple Restricted Boltzmann Machines are stacked, such that the outputs of the first RBM are the inputs of the subsequent RBM. The connections within individual layers are undirected, while the connections between layers are directed. However, there is an exception here. The connection between the top two layers is undirected.
A deep belief network can either be trained using a Greedy Layer-wise Training Algorithm or a Wake-Sleep Algorithm.
3. Deep Boltzmann Machines (DBMs)
Deep Boltzmann Machines are very similar to Deep Belief Networks. The difference between these two types of Boltzmann machines is that while connections between layers in DBNs are directed, in DBMs, the connections within layers, as well as the connections between the layers, are all undirected.