Generative Adversarial Network (GAN)
Deep learning framework where two neural networks compete to generate realistic data and distinguish real from fake.
What is a Generative Adversarial Network?
A generative adversarial network (GAN) is a deep learning framework consisting of two neural networks competing in a zero-sum game: a generator that creates synthetic data, and a discriminator that distinguishes between real and generated data. This adversarial process drives both networks to improve, resulting in the generator producing increasingly realistic outputs.
Key Characteristics
- Adversarial Training: Two networks compete against each other
- Generative Model: Can generate new data samples
- Unsupervised Learning: Learns from unlabeled data
- Zero-Sum Game: One network's gain is the other's loss
- Minimax Optimization: Solves a minimax game problem
- No Explicit Loss Function: Loss emerges from the competition
- High-Quality Outputs: Can generate photorealistic images
- Mode Collapse Risk: Potential to generate limited variety
Architecture Overview
graph LR
A[Random Noise] --> B[Generator Network]
B --> C[Generated Data]
D[Real Data] --> E[Discriminator Network]
C --> E
E --> F[Real/Fake Probability]
F -->|Feedback| B
F -->|Feedback| E
Mathematical Representation
The GAN training objective is a minimax game:
min_G max_D V(D, G) = E[log D(x)] + E[log (1 - D(G(z)))]
Where:
Gis the generator networkDis the discriminator networkxis real datazis random noiseD(x)is the discriminator's estimate of real data being realG(z)is the generator's output from noiseD(G(z))is the discriminator's estimate of generated data being real
Core Components
Generator Network
- Maps random noise to synthetic data
- Typically uses feedforward neural network or CNN architecture
- Learns to generate realistic data samples
- Goal: fool the discriminator
# Simple generator implementation
def create_generator(latent_dim, output_dim):
"""Create a generator network"""
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_dim=latent_dim),
layers.BatchNormalization(),
layers.Dense(256, activation='relu'),
layers.BatchNormalization(),
layers.Dense(512, activation='relu'),
layers.BatchNormalization(),
layers.Dense(output_dim, activation='tanh')
])
return model
Discriminator Network
- Classifies data as real or fake
- Typically uses CNN for image data, MLP for other data types
- Learns to distinguish real from generated data
- Goal: correctly identify real vs fake data
# Simple discriminator implementation
def create_discriminator(input_dim):
"""Create a discriminator network"""
model = tf.keras.Sequential([
layers.Dense(512, activation='leaky_relu', input_dim=input_dim),
layers.Dropout(0.3),
layers.Dense(256, activation='leaky_relu'),
layers.Dropout(0.3),
layers.Dense(128, activation='leaky_relu'),
layers.Dropout(0.3),
layers.Dense(1, activation='sigmoid')
])
return model
Adversarial Training
# GAN training implementation
class GAN:
def __init__(self, generator, discriminator):
self.generator = generator
self.discriminator = discriminator
self.gan = self._build_gan()
def _build_gan(self):
"""Build the combined GAN model"""
# Freeze discriminator during generator training
self.discriminator.trainable = False
# GAN model
model = tf.keras.Sequential([
self.generator,
self.discriminator
])
return model
def train(self, X_train, epochs, batch_size, latent_dim):
"""Train the GAN"""
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# Train discriminator
# Select random batch of real data
idx = np.random.randint(0, X_train.shape[0], batch_size)
real_data = X_train[idx]
# Generate fake data
noise = np.random.normal(0, 1, (batch_size, latent_dim))
fake_data = self.generator.predict(noise)
# Train discriminator
d_loss_real = self.discriminator.train_on_batch(real_data, valid)
d_loss_fake = self.discriminator.train_on_batch(fake_data, fake)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# Train generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
g_loss = self.gan.train_on_batch(noise, valid)
# Print progress
if epoch % 100 == 0:
print(f"{epoch} [D loss: {d_loss[0]} | D accuracy: {100*d_loss[1]}] [G loss: {g_loss}]")
GAN Variants
Deep Convolutional GAN (DCGAN)
# DCGAN generator implementation
def create_dcgan_generator(latent_dim):
"""Create a DCGAN generator for 64x64 images"""
model = tf.keras.Sequential([
# Start with a dense layer
layers.Dense(4*4*512, use_bias=False, input_shape=(latent_dim,)),
layers.BatchNormalization(),
layers.LeakyReLU(),
# Reshape into a 4x4 feature map
layers.Reshape((4, 4, 512)),
# Upsample to 8x8
layers.Conv2DTranspose(256, (5, 5), strides=(2, 2), padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
# Upsample to 16x16
layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
# Upsample to 32x32
layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
layers.BatchNormalization(),
layers.LeakyReLU(),
# Upsample to 64x64
layers.Conv2DTranspose(3, (5, 5), strides=(1, 1), padding='same', use_bias=False, activation='tanh')
])
return model
Wasserstein GAN (WGAN)
# WGAN implementation with gradient penalty
class WGAN:
def __init__(self, generator, discriminator):
self.generator = generator
self.discriminator = discriminator
self.gan = self._build_gan()
def _build_gan(self):
"""Build the combined GAN model"""
# Freeze discriminator during generator training
self.discriminator.trainable = False
# GAN model
model = tf.keras.Sequential([
self.generator,
self.discriminator
])
return model
def train(self, X_train, epochs, batch_size, latent_dim, n_critic=5, clip_value=0.01):
"""Train the WGAN"""
# Adversarial ground truths
valid = np.ones((batch_size, 1))
for epoch in range(epochs):
for _ in range(n_critic):
# Train discriminator (critic)
# Select random batch of real data
idx = np.random.randint(0, X_train.shape[0], batch_size)
real_data = X_train[idx]
# Generate fake data
noise = np.random.normal(0, 1, (batch_size, latent_dim))
fake_data = self.generator.predict(noise)
# Train critic
d_loss_real = self.discriminator.train_on_batch(real_data, valid)
d_loss_fake = self.discriminator.train_on_batch(fake_data, -valid)
d_loss = 0.5 * np.add(d_loss_fake, d_loss_real)
# Clip critic weights
for layer in self.discriminator.layers:
weights = layer.get_weights()
weights = [np.clip(w, -clip_value, clip_value) for w in weights]
layer.set_weights(weights)
# Train generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
g_loss = self.gan.train_on_batch(noise, valid)
# Print progress
if epoch % 100 == 0:
print(f"{epoch} [D loss: {d_loss}] [G loss: {g_loss}]")
CycleGAN
# CycleGAN implementation (conceptual)
class CycleGAN:
def __init__(self, input_shape):
self.input_shape = input_shape
# Create generators
self.g_AB = self._create_generator() # A -> B
self.g_BA = self._create_generator() # B -> A
# Create discriminators
self.d_A = self._create_discriminator() # Discriminates real/fake A
self.d_B = self._create_discriminator() # Discriminates real/fake B
def _create_generator(self):
"""Create a generator network for CycleGAN"""
# Encoder
inputs = layers.Input(shape=self.input_shape)
x = layers.Conv2D(32, (7, 7), padding='same')(inputs)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
# Downsampling
x = layers.Conv2D(64, (3, 3), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(128, (3, 3), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
# Residual blocks
for _ in range(6):
x = self._residual_block(x)
# Upsampling
x = layers.Conv2DTranspose(64, (3, 3), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
# Output
x = layers.Conv2D(3, (7, 7), padding='same')(x)
outputs = layers.Activation('tanh')(x)
return models.Model(inputs, outputs)
def _residual_block(self, x):
"""Residual block for CycleGAN generator"""
shortcut = x
x = layers.Conv2D(128, (3, 3), padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(128, (3, 3), padding='same')(x)
x = layers.InstanceNormalization()(x)
return layers.Add()([x, shortcut])
def _create_discriminator(self):
"""Create a discriminator network for CycleGAN"""
inputs = layers.Input(shape=self.input_shape)
x = layers.Conv2D(64, (4, 4), strides=2, padding='same')(inputs)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(128, (4, 4), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(256, (4, 4), strides=2, padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(512, (4, 4), padding='same')(x)
x = layers.InstanceNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
outputs = layers.Conv2D(1, (4, 4), padding='same')(x)
return models.Model(inputs, outputs)
GAN Training Challenges
Mode Collapse
- Problem: Generator produces limited variety of outputs
- Solution: Use minibatch discrimination, feature matching
- Example: Add minibatch discrimination layer to discriminator
# Minibatch discrimination layer
class MinibatchDiscrimination(layers.Layer):
def __init__(self, num_kernels, kernel_dim, **kwargs):
super(MinibatchDiscrimination, self).__init__(**kwargs)
self.num_kernels = num_kernels
self.kernel_dim = kernel_dim
def build(self, input_shape):
self.kernel = self.add_weight(
name='kernel',
shape=(input_shape[1], self.num_kernels * self.kernel_dim),
initializer='glorot_uniform')
def call(self, inputs):
# Compute activations
activations = tf.matmul(inputs, self.kernel)
activations = tf.reshape(activations, (-1, self.num_kernels, self.kernel_dim))
# Compute L1 distance between samples
diffs = tf.expand_dims(activations, 3) - tf.expand_dims(tf.transpose(activations, [1, 2, 0]), 0)
abs_diffs = tf.reduce_sum(tf.abs(diffs), axis=2)
# Compute minibatch features
minibatch_features = tf.reduce_sum(tf.exp(-abs_diffs), axis=2)
return tf.concat([inputs, minibatch_features], axis=1)
Training Instability
- Problem: Unstable training, oscillations
- Solution: Use Wasserstein loss, gradient penalty
- Example: WGAN-GP implementation
# WGAN with gradient penalty
class WGANGP:
def __init__(self, generator, discriminator):
self.generator = generator
self.discriminator = discriminator
self.gan = self._build_gan()
def _build_gan(self):
"""Build the combined GAN model"""
# Freeze discriminator during generator training
self.discriminator.trainable = False
# GAN model
model = tf.keras.Sequential([
self.generator,
self.discriminator
])
return model
def gradient_penalty(self, real_data, fake_data):
"""Calculate gradient penalty"""
# Create interpolated data
alpha = tf.random.uniform([real_data.shape[0], 1, 1, 1], 0., 1.)
interpolated = alpha * real_data + (1 - alpha) * fake_data
with tf.GradientTape() as tape:
tape.watch(interpolated)
# Get discriminator output for interpolated data
pred = self.discriminator(interpolated)
# Calculate gradients
gradients = tape.gradient(pred, interpolated)
gradients_norm = tf.sqrt(tf.reduce_sum(tf.square(gradients), axis=[1, 2, 3]))
penalty = tf.reduce_mean((gradients_norm - 1.)**2)
return penalty
def train(self, X_train, epochs, batch_size, latent_dim, n_critic=5, gp_weight=10):
"""Train the WGAN-GP"""
# Adversarial ground truths
valid = np.ones((batch_size, 1))
for epoch in range(epochs):
for _ in range(n_critic):
# Train discriminator (critic)
# Select random batch of real data
idx = np.random.randint(0, X_train.shape[0], batch_size)
real_data = X_train[idx]
# Generate fake data
noise = np.random.normal(0, 1, (batch_size, latent_dim))
fake_data = self.generator.predict(noise)
# Train critic
d_loss_real = self.discriminator.train_on_batch(real_data, valid)
d_loss_fake = self.discriminator.train_on_batch(fake_data, -valid)
# Calculate gradient penalty
gp = self.gradient_penalty(real_data, fake_data)
# Total discriminator loss
d_loss = 0.5 * np.add(d_loss_fake, d_loss_real) + gp_weight * gp
# Train generator
noise = np.random.normal(0, 1, (batch_size, latent_dim))
g_loss = self.gan.train_on_batch(noise, valid)
# Print progress
if epoch % 100 == 0:
print(f"{epoch} [D loss: {d_loss}] [G loss: {g_loss}] [GP: {gp}]")
Vanishing Gradients
- Problem: Discriminator becomes too strong, generator gets no gradient
- Solution: Use label smoothing, alternative loss functions
- Example: Label smoothing implementation
# Label smoothing for GAN training
def smooth_positive_labels(y):
"""Apply label smoothing to positive labels"""
return y - 0.3 + (np.random.random(y.shape) * 0.5)
def smooth_negative_labels(y):
"""Apply label smoothing to negative labels"""
return y + np.random.random(y.shape) * 0.3
GAN Applications
Image Generation
# Image generation with DCGAN
import matplotlib.pyplot as plt
# Create and train DCGAN
latent_dim = 100
generator = create_dcgan_generator(latent_dim)
discriminator = create_dcgan_discriminator()
gan = GAN(generator, discriminator)
# Train on CIFAR-10 dataset
(train_images, _), (_, _) = tf.keras.datasets.cifar10.load_data()
train_images = (train_images.astype('float32') - 127.5) / 127.5 # Normalize to [-1, 1]
gan.train(train_images, epochs=10000, batch_size=64, latent_dim=latent_dim)
# Generate images
n = 10 # Number of images to generate
noise = np.random.normal(0, 1, (n, latent_dim))
generated_images = generator.predict(noise)
# Display generated images
plt.figure(figsize=(20, 4))
for i in range(n):
ax = plt.subplot(2, n//2, i + 1)
plt.imshow((generated_images[i] + 1) / 2) # Scale back to [0, 1]
plt.axis('off')
plt.suptitle('Generated Images')
plt.show()
Image-to-Image Translation
# Image-to-image translation with CycleGAN (conceptual)
def train_cyclegan(cyclegan, X_A, X_B, epochs, batch_size):
"""Train CycleGAN on two domains"""
# Adversarial ground truths
valid = np.ones((batch_size, 1))
fake = np.zeros((batch_size, 1))
for epoch in range(epochs):
# Select random batch
idx = np.random.randint(0, X_A.shape[0], batch_size)
real_A = X_A[idx]
real_B = X_B[idx]
# Generate fake images
fake_B = cyclegan.g_AB.predict(real_A)
fake_A = cyclegan.g_BA.predict(real_B)
# Train discriminators
dA_loss_real = cyclegan.d_A.train_on_batch(real_A, valid)
dA_loss_fake = cyclegan.d_A.train_on_batch(fake_A, fake)
dA_loss = 0.5 * np.add(dA_loss_real, dA_loss_fake)
dB_loss_real = cyclegan.d_B.train_on_batch(real_B, valid)
dB_loss_fake = cyclegan.d_B.train_on_batch(fake_B, fake)
dB_loss = 0.5 * np.add(dB_loss_real, dB_loss_fake)
# Train generators
# Adversarial loss
g_loss = cyclegan.combined.train_on_batch(
[real_A, real_B],
[valid, valid, real_A, real_B, real_A, real_B])
# Print progress
if epoch % 100 == 0:
print(f"{epoch} [D loss: {0.5*(dA_loss + dB_loss)}] [G loss: {g_loss[0]}]")
Super-Resolution
# Super-resolution GAN (SRGAN) implementation
def create_srgan_generator():
"""Create a generator for super-resolution"""
# Low resolution input
lr_input = layers.Input(shape=(64, 64, 3))
# Pre-residual block
x = layers.Conv2D(64, (9, 9), padding='same')(lr_input)
x = layers.PReLU(shared_axes=[1, 2])(x)
# Store residual for skip connection
residual = x
# B residual blocks
for _ in range(16):
x = layers.Conv2D(64, (3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.PReLU(shared_axes=[1, 2])(x)
x = layers.Conv2D(64, (3, 3), padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.Add()([x, residual])
residual = x
# Upsampling
x = layers.Conv2D(256, (3, 3), padding='same')(x)
x = layers.UpSampling2D(size=(2, 2))(x)
x = layers.PReLU(shared_axes=[1, 2])(x)
x = layers.Conv2D(256, (3, 3), padding='same')(x)
x = layers.UpSampling2D(size=(2, 2))(x)
x = layers.PReLU(shared_axes=[1, 2])(x)
# Output
hr_output = layers.Conv2D(3, (9, 9), padding='same', activation='tanh')(x)
return models.Model(lr_input, hr_output)
Style Transfer
# Style transfer with GAN (conceptual)
class StyleGAN:
def __init__(self, content_shape, style_shape):
self.content_shape = content_shape
self.style_shape = style_shape
# Create networks
self.encoder = self._create_encoder()
self.decoder = self._create_decoder()
self.discriminator = self._create_discriminator()
def _create_encoder(self):
"""Create encoder network"""
content_input = layers.Input(shape=self.content_shape)
style_input = layers.Input(shape=self.style_shape)
# Process content
x = layers.Conv2D(32, (7, 7), padding='same')(content_input)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
# Process style
y = layers.Conv2D(32, (7, 7), padding='same')(style_input)
y = layers.InstanceNormalization()(y)
y = layers.Activation('relu')(y)
# Combine features
combined = layers.Concatenate()([x, y])
return models.Model([content_input, style_input], combined)
def _create_decoder(self):
"""Create decoder network"""
inputs = layers.Input(shape=(None, None, 64))
# Upsample
x = layers.Conv2D(64, (3, 3), padding='same')(inputs)
x = layers.UpSampling2D(size=(2, 2))(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
x = layers.Conv2D(32, (3, 3), padding='same')(x)
x = layers.UpSampling2D(size=(2, 2))(x)
x = layers.InstanceNormalization()(x)
x = layers.Activation('relu')(x)
# Output
outputs = layers.Conv2D(3, (7, 7), padding='same', activation='tanh')(x)
return models.Model(inputs, outputs)
def transfer_style(self, content_image, style_image):
"""Transfer style from style_image to content_image"""
# Encode
features = self.encoder.predict([content_image, style_image])
# Decode
stylized = self.decoder.predict(features)
return stylized
GAN Research
Key Papers
- "Generative Adversarial Nets" (Goodfellow et al., 2014)
- Introduced the GAN framework
- Demonstrated adversarial training
- Foundation for GAN research
- "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (Radford et al., 2015)
- Introduced DCGAN architecture
- Demonstrated stable training with CNNs
- Foundation for convolutional GANs
- "Wasserstein GAN" (Arjovsky et al., 2017)
- Introduced Wasserstein loss
- Improved training stability
- Addressed mode collapse
- "Improved Training of Wasserstein GANs" (Gulrajani et al., 2017)
- Introduced gradient penalty
- Further improved WGAN training
- Foundation for WGAN-GP
- "CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" (Zhu et al., 2017)
- Introduced CycleGAN
- Enabled unpaired image translation
- Foundation for style transfer
- "Progressive Growing of GANs for Improved Quality, Stability, and Variation" (Karras et al., 2017)
- Introduced progressive growing
- Demonstrated high-resolution image generation
- Foundation for StyleGAN
Emerging Research Directions
- Stable Training: More stable GAN training methods
- High-Resolution Generation: Generating very high-resolution images
- Disentangled Representations: Learning independent latent factors
- Conditional Generation: Controllable generation with conditioning
- Few-Shot Learning: GANs that learn from few examples
- 3D Generation: Generating 3D objects and scenes
- Video Generation: Generating realistic videos
- Neuromorphic GANs: Brain-inspired GAN architectures
- Quantum GANs: GANs for quantum computing
- Explainable GANs: More interpretable GAN architectures
- Energy-Efficient GANs: Green computing approaches
- Multimodal GANs: GANs for multiple data modalities
- Continual Learning GANs: GANs that learn continuously
- Self-Supervised GANs: GANs for self-supervised learning
GAN Best Practices
Implementation Guidelines
| Aspect | Recommendation | Notes |
|---|---|---|
| Architecture | Use DCGAN architecture for images | Good starting point for image GANs |
| Generator | Use batch normalization | Stabilizes training |
| Discriminator | Use leaky ReLU, dropout | Prevents mode collapse |
| Loss Function | Start with binary cross-entropy | Consider Wasserstein for stability |
| Optimizer | Adam with low learning rate | lr=0.0002, beta1=0.5 often works well |
| Batch Size | 32-128 depending on GPU memory | Larger batches for stability |
| Latent Dimension | 100-512 dimensions | Balance expressiveness and complexity |
| Normalization | Normalize data to -1, 1 | Works well with tanh activation |
| Training Ratio | Train discriminator more than generator | n_critic=5 for WGAN |
| Monitoring | Monitor both losses | Should both decrease over time |
Common Pitfalls and Solutions
| Pitfall | Solution | Example |
|---|---|---|
| Mode Collapse | Use minibatch discrimination, feature matching | Add minibatch layer to discriminator |
| Training Instability | Use Wasserstein loss, gradient penalty | Switch to WGAN-GP |
| Vanishing Gradients | Use label smoothing, alternative losses | Apply label smoothing |
| Poor Generation Quality | Increase model capacity, train longer | Add more layers to generator |
| Slow Convergence | Adjust learning rate, use momentum | Use Adam optimizer with lr=0.0002 |
| Discriminator Too Strong | Train generator more, reduce discriminator capacity | Reduce discriminator layers |
| Generator Too Weak | Increase generator capacity | Add more layers to generator |
| Overfitting | Add regularization, early stopping | Add dropout with p=0.3 |
GAN Evaluation Metrics
Quantitative Metrics
| Metric | Description | Formula/Implementation |
|---|---|---|
| Inception Score (IS) | Measures quality and diversity of generated images | exp(E[KL(p(y |
| Fréchet Inception Distance (FID) | Measures similarity between real and generated distributions | |
| Kernel Inception Distance (KID) | Alternative to FID, more robust to small sample sizes | MMD² between real and generated features |
| Precision and Recall | Measures quality (precision) and diversity (recall) | Based on manifold estimation |
| Perceptual Path Length | Measures smoothness of latent space | E[ |
| Linear Separability | Measures disentanglement of latent factors | Accuracy of linear classifier on latent factors |
Qualitative Evaluation
- Visual Inspection: Manually examine generated samples
- Nearest Neighbors: Compare generated samples to training data
- Interpolation: Test smoothness of latent space interpolation
- Attribute Manipulation: Test controllability of generation
- User Studies: Human evaluation of generated samples
# Inception Score implementation
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.applications.inception_v3 import preprocess_input
import numpy as np
from scipy.stats import entropy
def calculate_inception_score(images, n_split=10, eps=1E-16):
"""Calculate Inception Score for generated images"""
# Load InceptionV3 model
model = InceptionV3()
# Preprocess images
images = preprocess_input(images)
# Get predictions
preds = model.predict(images)
# Split into groups
split_scores = []
n_part = images.shape[0] // n_split
for i in range(n_split):
ix_start, ix_end = i * n_part, (i+1) * n_part
p_yx = preds[ix_start:ix_end]
p_y = np.expand_dims(p_yx.mean(axis=0), 0)
kl_d = p_yx * (np.log(p_yx + eps) - np.log(p_y + eps))
sum_kl_d = kl_d.sum(axis=1)
avg_kl_d = np.mean(sum_kl_d)
split_scores.append(np.exp(avg_kl_d))
is_score = np.mean(split_scores)
is_std = np.std(split_scores)
return is_score, is_std
# Fréchet Inception Distance implementation
def calculate_fid(real_images, generated_images):
"""Calculate Fréchet Inception Distance"""
# Load InceptionV3 model
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))
# Preprocess images
real_images = preprocess_input(real_images)
generated_images = preprocess_input(generated_images)
# Get features
real_features = model.predict(real_images)
generated_features = model.predict(generated_images)
# Calculate mean and covariance
mu1, sigma1 = real_features.mean(axis=0), np.cov(real_features, rowvar=False)
mu2, sigma2 = generated_features.mean(axis=0), np.cov(generated_features, rowvar=False)
# Calculate FID
ssdiff = np.sum((mu1 - mu2)**2.0)
covmean = sqrtm(sigma1.dot(sigma2))
if np.iscomplexobj(covmean):
covmean = covmean.real
fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
GAN in Practice
Case Study: Face Generation
# Face generation with DCGAN
import tensorflow as tf
from tensorflow.keras import layers, models, callbacks
import numpy as np
import matplotlib.pyplot as plt
# Load CelebA dataset (conceptual)
# In practice, you would load actual face images
def load_celeba():
# This is a placeholder - in practice you would load actual images
(x_train, _), (_, _) = tf.keras.datasets.cifar10.load_data()
x_train = (x_train.astype('float32') - 127.5) / 127.5 # Normalize to [-1, 1]
return x_train
# Create DCGAN
latent_dim = 100
generator = create_dcgan_generator(latent_dim)
discriminator = create_dcgan_discriminator()
gan = GAN(generator, discriminator)
# Train
X_train = load_celeba()
gan.train(X_train, epochs=20000, batch_size=64, latent_dim=latent_dim)
# Generate faces
n = 10 # Number of faces to generate
noise = np.random.normal(0, 1, (n, latent_dim))
generated_faces = generator.predict(noise)
# Display generated faces
plt.figure(figsize=(20, 4))
for i in range(n):
ax = plt.subplot(2, n//2, i + 1)
plt.imshow((generated_faces[i] + 1) / 2) # Scale back to [0, 1]
plt.axis('off')
plt.suptitle('Generated Faces')
plt.show()
# Latent space interpolation
def interpolate_faces(generator, n=10):
"""Interpolate between two random faces"""
# Generate two random latent vectors
z1 = np.random.normal(0, 1, (1, latent_dim))
z2 = np.random.normal(0, 1, (1, latent_dim))
# Create interpolation
interpolated = []
for alpha in np.linspace(0, 1, n):
z = alpha * z1 + (1 - alpha) * z2
generated = generator.predict(z)
interpolated.append(generated[0])
# Display interpolation
plt.figure(figsize=(20, 2))
for i in range(n):
ax = plt.subplot(1, n, i + 1)
plt.imshow((interpolated[i] + 1) / 2)
plt.axis('off')
plt.suptitle('Latent Space Interpolation')
plt.show()
interpolate_faces(generator)
Case Study: Art Generation
# Art generation with StyleGAN (conceptual)
class ArtGAN:
def __init__(self):
self.latent_dim = 512
self.generator = self._create_stylegan_generator()
self.discriminator = self._create_stylegan_discriminator()
def _create_stylegan_generator(self):
"""Create StyleGAN generator (simplified)"""
# Mapping network
z_input = layers.Input(shape=(self.latent_dim,))
x = layers.Dense(512, activation='leaky_relu')(z_input)
x = layers.Dense(512, activation='leaky_relu')(x)
x = layers.Dense(512, activation='leaky_relu')(x)
w = layers.Dense(512)(x)
# Synthesis network
inputs = layers.Input(shape=(1, 1, 512))
x = layers.Conv2DTranspose(512, (4, 4), use_bias=False)(inputs)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
# Style modulation
style = layers.Dense(512)(w)
style = layers.Reshape((1, 1, 512))(style)
x = x * style + x
# Upsample blocks
for i in range(4):
x = layers.UpSampling2D()(x)
x = layers.Conv2D(512 // (2**i), (3, 3), padding='same', use_bias=False)(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
# Add noise
noise = layers.GaussianNoise(0.1)(x)
x = x + noise
# Output
outputs = layers.Conv2D(3, (1, 1), activation='tanh')(x)
return models.Model([z_input, inputs], outputs)
def _create_stylegan_discriminator(self):
"""Create StyleGAN discriminator"""
inputs = layers.Input(shape=(64, 64, 3))
x = layers.Conv2D(64, (4, 4), strides=2, padding='same')(inputs)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(128, (4, 4), strides=2, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(256, (4, 4), strides=2, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2D(512, (4, 4), strides=2, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1)(x)
return models.Model(inputs, outputs)
def generate_art(self, n_samples):
"""Generate art samples"""
z = np.random.normal(0, 1, (n_samples, self.latent_dim))
constant_input = np.random.normal(0, 1, (n_samples, 1, 1, 512))
generated = self.generator.predict([z, constant_input])
return (generated + 1) / 2 # Scale to [0, 1]
Future Directions
- Stable Training: More robust and stable GAN training methods
- High-Fidelity Generation: Generating extremely high-resolution images
- Disentangled Representations: Learning independent, interpretable factors
- Controllable Generation: Precise control over generated outputs
- Few-Shot Generation: GANs that learn from few examples
- 3D Generation: Generating 3D objects and scenes
- Video Generation: Generating realistic videos
- Neuromorphic GANs: Brain-inspired GAN architectures
- Quantum GANs: GANs for quantum computing
- Explainable GANs: More interpretable GAN architectures
- Energy-Efficient GANs: Green computing approaches
- Multimodal GANs: GANs for multiple data modalities
- Continual Learning GANs: GANs that learn continuously
- Self-Supervised GANs: GANs for self-supervised learning
- Ethical GANs: GANs with built-in ethical constraints
External Resources
- GAN Paper (Goodfellow et al.)
- GANs (Wikipedia)
- GAN Tutorial (TensorFlow)
- GAN Lab (Interactive Visualization)
- DCGAN Paper (Radford et al.)
- WGAN Paper (Arjovsky et al.)
- WGAN-GP Paper (Gulrajani et al.)
- CycleGAN Paper (Zhu et al.)
- Progressive GAN Paper (Karras et al.)
- StyleGAN Paper (Karras et al.)
- GANs for Art Generation (Elgammal et al.)
- GAN Evaluation Metrics (arXiv)
- GAN Zoo (GitHub)
- GANs in PyTorch (PyTorch Documentation)
- Deep Learning Book - GANs Chapter