What are generative adversarial networks?
Generative adversarial networks (GANs) are an exciting recent innovation in machine learning. GANs are generative models: they create new data instances that resemble your training data.
They are algorithmic architectures that use two neural networks, pitting one against the other (thus the “adversarial”) in order to generate new, synthetic instances of data that can pass for real data. They are used widely in image generation, video generation, and voice generation.
For example, GANs can create images that look like photographs of human faces, even though the faces don't belong to any real person.
How do Generative adversarial networks (GANs) work?
One neural network, called the generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator decides whether each instance of data that it reviews belongs to the actual training dataset or not.
Let’s say we’re trying to do something banaler than mimic the Mona Lisa. We’re going to generate hand-written numerals like those found in the MNIST dataset, which is taken from the real world. The goal of the discriminator, when shown an instance from the true MNIST dataset, is to recognize those that are authentic.
Meanwhile, the generator is creating new, synthetic images that it passes to the discriminator. It does so in the hopes that they, too, will be deemed authentic, even though they are fake. The goal of the generator is to generate passable hand-written digits: to lie without being caught. The goal of the discriminator is to identify images coming from the generator as fake.
Here are the steps a GAN takes:
- The generator takes in random numbers and returns an image.
- This generated image is fed into the discriminator alongside a stream of images taken from the actual, ground-truth dataset.
- The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with 1 representing a prediction of authenticity and 0 representing fake.
So you have a double feedback loop:
- The discriminator is in a feedback loop with the ground truth of the images, which we know.
- The generator is in a feedback loop with the discriminator.
What are generative adversarial networks used for?
1. Image-to-Image Translation
This is a bit of a catch-all task, for those papers that present GANs that can do many image translation tasks.
Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” demonstrate GANs, specifically their pix2pix approach for many image-to-image translation tasks.
Examples include translation tasks such as:
- Translation of semantic images to photographs of cityscapes and buildings.
- Translation of satellite photographs to Google Maps.
- Translation of photos from day to night.
- Translation of black and white photographs to color.
- Translation of sketches to color photographs.
2. 3D Object Generation
Jiajun Wu, et al. in their 2016 paper titled “Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling” demonstrate a GAN for generating new three-dimensional objects (e.g. 3D models) such as chairs, cars, sofas, and tables.
Matheus Gadelha, et al. in their 2016 paper titled “3D Shape Induction from 2D Views of Multiple Objects” use GANs to generate three-dimensional models given two-dimensional pictures of objects from multiple perspectives.
3. Clothing Translation
Donggeun Yoo, et al. in their 2016 paper titled “Pixel-Level Domain Transfer” demonstrate the use of GANs to generate photographs of clothing as may be seen in a catalog or online store, based on photographs of models wearing the clothing.
4. Photos to Emojis
Yaniv Taigman, et al. in their 2016 paper titled “Unsupervised Cross-Domain Image Generation” used a GAN to translate images from one domain to another, including from street numbers to MNIST handwritten digits, and from photographs of celebrities to what they call emojis or small cartoon faces.
5. Text-to-Image Translation (text2image)
Han Zhang, et al. in their 2016 paper titled “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks” demonstrate the use of GANs, specifically their StackGAN to generate realistic looking photographs from textual descriptions of simple objects like birds and flowers.
Scott Reed, et al. in their 2016 paper titled “Generative Adversarial Text to Image Synthesis” also provide an early example of text to image generation of small objects and scenes including birds, flowers, and more.
In another 2016 paper titled “Learning What and Where to Draw” by Scott Reed, et al., they expanded upon this capability and use GANs to both generate images from text and use bounding boxes and key points as hints as to where to draw a described object, like a bird.
What are the applications of generative adversarial networks?
Some of the applications and usecases of GANs include
Improving cybersecurity
One of the methods that hackers use is known as an adversarial attack. The hackers manipulate images by adding malicious data to them, tricking the neural network itself and compromising the intended working of the algorithm.
It is possible to train generative adversarial networks to identify such instances of fraud. GANs can make deep learning models more robust and identify any malicious information that might be added to images by hackers.
Generating animation models
Generative adversarial networks can be used to automatically generate 3D models needed in video games, animated movies, or cartoons. It is possible for the network to create new 3D models based on the existing dataset of 2D images provided. They can analyze the 2D photos to recreate the 3D models of them really quickly. This saves a lot of time for animators and allows them to focus on other tasks.
Editing photographs
This goes beyond regular photo-editing enhancements. GANs can reconstruct images of faces to identify changes in features such as hair color, facial expressions, or gender, etc. They can even facial images of people at various ages.