Generative Adversarial Network (GAN)

Deep learning framework where two neural networks compete to generate realistic data and distinguish real from fake.

What is a Generative Adversarial Network?

A generative adversarial network (GAN) is a deep learning framework consisting of two neural networks competing in a zero-sum game: a generator that creates synthetic data, and a discriminator that distinguishes between real and generated data. This adversarial process drives both networks to improve, resulting in the generator producing increasingly realistic outputs.

Key Characteristics

  • Adversarial Training: Two networks compete against each other
  • Generative Model: Can generate new data samples
  • Unsupervised Learning: Learns from unlabeled data
  • Zero-Sum Game: One network's gain is the other's loss
  • Minimax Optimization: Solves a minimax game problem
  • No Explicit Loss Function: Loss emerges from the competition
  • High-Quality Outputs: Can generate photorealistic images
  • Mode Collapse Risk: Potential to generate limited variety

Architecture Overview

graph LR
    A[Random Noise] --> B[Generator Network]
    B --> C[Generated Data]
    D[Real Data] --> E[Discriminator Network]
    C --> E
    E --> F[Real/Fake Probability]
    F -->|Feedback| B
    F -->|Feedback| E

Mathematical Representation

The GAN training objective is a minimax game:

min_G max_D V(D, G) = E[log D(x)] + E[log (1 - D(G(z)))]

Where:

  • G is the generator network
  • D is the discriminator network
  • x is real data
  • z is random noise
  • D(x) is the discriminator's estimate of real data being real
  • G(z) is the generator's output from noise
  • D(G(z)) is the discriminator's estimate of generated data being real

Core Components

Generator Network

  • Maps random noise to synthetic data
  • Typically uses feedforward neural network or CNN architecture
  • Learns to generate realistic data samples
  • Goal: fool the discriminator
# Simple generator implementation
def create_generator(latent_dim, output_dim):
    """Create a generator network"""
    model = tf.keras.Sequential([
        layers.Dense(128, activation='relu', input_dim=latent_dim),
        layers.BatchNormalization(),
        layers.Dense(256, activation='relu'),
        layers.BatchNormalization(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dense(output_dim, activation='tanh')
    ])
    return model

Discriminator Network

  • Classifies data as real or fake
  • Typically uses CNN for image data, MLP for other data types
  • Learns to distinguish real from generated data
  • Goal: correctly identify real vs fake data
# Simple discriminator implementation
def create_discriminator(input_dim):
    """Create a discriminator network"""
    model = tf.keras.Sequential([
        layers.Dense(512, activation='leaky_relu', input_dim=input_dim),
        layers.Dropout(0.3),
        layers.Dense(256, activation='leaky_relu'),
        layers.Dropout(0.3),
        layers.Dense(128, activation='leaky_relu'),
        layers.Dropout(0.3),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

Adversarial Training

# GAN training implementation
class GAN:
    def __init__(self, generator, discriminator):
        self.generator = generator
        self.discriminator = discriminator
        self.gan = self._build_gan()

    def _build_gan(self):
        """Build the combined GAN model"""
        # Freeze discriminator during generator training
        self.discriminator.trainable = False

        # GAN model
        model = tf.keras.Sequential([
            self.generator,
            self.discriminator
        ])

        return model

    def train(self, X_train, epochs, batch_size, latent_dim):
        """Train the GAN"""
        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):
            # Train discriminator
            # Select random batch of real data
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            real_data = X_train[idx]

            # Generate fake data
            noise = np.random.normal(0, 1, (batch_size, latent_dim))
            fake_data = self.generator.predict(noise)

            # Train discriminator
            d_loss_real = self.discriminator.train_on_batch(real_data, valid)
            d_loss_fake = self.discriminator.train_on_batch(fake_data, fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # Train generator
            noise = np.random.normal(0, 1, (batch_size, latent_dim))
            g_loss = self.gan.train_on_batch(noise, valid)

            # Print progress
            if epoch % 100 == 0:
                print(f"{epoch} [D loss: {d_loss[0]} | D accuracy: {100*d_loss[1]}] [G loss: {g_loss}]")

GAN Variants

Deep Convolutional GAN (DCGAN)

# DCGAN generator implementation
def create_dcgan_generator(latent_dim):
    """Create a DCGAN generator for 64x64 images"""
    model = tf.keras.Sequential([
        # Start with a dense layer
        layers.Dense(4*4*512, use_bias=False, input_shape=(latent_dim,)),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        # Reshape into a 4x4 feature map
        layers.Reshape((4, 4, 512)),

        # Upsample to 8x8
        layers.Conv2DTranspose(256, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        # Upsample to 16x16
        layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        # Upsample to 32x32
        layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False),
        layers.BatchNormalization(),
        layers.LeakyReLU(),

        # Upsample to 64x64
        layers.Conv2DTranspose(3, (5, 5), strides=(1, 1), padding='same', use_bias=False, activation='tanh')
    ])
    return model

Wasserstein GAN (WGAN)

# WGAN implementation with gradient penalty
class WGAN:
    def __init__(self, generator, discriminator):
        self.generator = generator
        self.discriminator = discriminator
        self.gan = self._build_gan()

    def _build_gan(self):
        """Build the combined GAN model"""
        # Freeze discriminator during generator training
        self.discriminator.trainable = False

        # GAN model
        model = tf.keras.Sequential([
            self.generator,
            self.discriminator
        ])

        return model

    def train(self, X_train, epochs, batch_size, latent_dim, n_critic=5, clip_value=0.01):
        """Train the WGAN"""
        # Adversarial ground truths
        valid = np.ones((batch_size, 1))

        for epoch in range(epochs):
            for _ in range(n_critic):
                # Train discriminator (critic)
                # Select random batch of real data
                idx = np.random.randint(0, X_train.shape[0], batch_size)
                real_data = X_train[idx]

                # Generate fake data
                noise = np.random.normal(0, 1, (batch_size, latent_dim))
                fake_data = self.generator.predict(noise)

                # Train critic
                d_loss_real = self.discriminator.train_on_batch(real_data, valid)
                d_loss_fake = self.discriminator.train_on_batch(fake_data, -valid)
                d_loss = 0.5 * np.add(d_loss_fake, d_loss_real)

                # Clip critic weights
                for layer in self.discriminator.layers:
                    weights = layer.get_weights()
                    weights = [np.clip(w, -clip_value, clip_value) for w in weights]
                    layer.set_weights(weights)

            # Train generator
            noise = np.random.normal(0, 1, (batch_size, latent_dim))
            g_loss = self.gan.train_on_batch(noise, valid)

            # Print progress
            if epoch % 100 == 0:
                print(f"{epoch} [D loss: {d_loss}] [G loss: {g_loss}]")

CycleGAN

# CycleGAN implementation (conceptual)
class CycleGAN:
    def __init__(self, input_shape):
        self.input_shape = input_shape

        # Create generators
        self.g_AB = self._create_generator()  # A -> B
        self.g_BA = self._create_generator()  # B -> A

        # Create discriminators
        self.d_A = self._create_discriminator()  # Discriminates real/fake A
        self.d_B = self._create_discriminator()  # Discriminates real/fake B

    def _create_generator(self):
        """Create a generator network for CycleGAN"""
        # Encoder
        inputs = layers.Input(shape=self.input_shape)
        x = layers.Conv2D(32, (7, 7), padding='same')(inputs)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        # Downsampling
        x = layers.Conv2D(64, (3, 3), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.Conv2D(128, (3, 3), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        # Residual blocks
        for _ in range(6):
            x = self._residual_block(x)

        # Upsampling
        x = layers.Conv2DTranspose(64, (3, 3), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.Conv2DTranspose(32, (3, 3), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        # Output
        x = layers.Conv2D(3, (7, 7), padding='same')(x)
        outputs = layers.Activation('tanh')(x)

        return models.Model(inputs, outputs)

    def _residual_block(self, x):
        """Residual block for CycleGAN generator"""
        shortcut = x
        x = layers.Conv2D(128, (3, 3), padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)
        x = layers.Conv2D(128, (3, 3), padding='same')(x)
        x = layers.InstanceNormalization()(x)
        return layers.Add()([x, shortcut])

    def _create_discriminator(self):
        """Create a discriminator network for CycleGAN"""
        inputs = layers.Input(shape=self.input_shape)
        x = layers.Conv2D(64, (4, 4), strides=2, padding='same')(inputs)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(128, (4, 4), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(256, (4, 4), strides=2, padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(512, (4, 4), padding='same')(x)
        x = layers.InstanceNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        outputs = layers.Conv2D(1, (4, 4), padding='same')(x)

        return models.Model(inputs, outputs)

GAN Training Challenges

Mode Collapse

  • Problem: Generator produces limited variety of outputs
  • Solution: Use minibatch discrimination, feature matching
  • Example: Add minibatch discrimination layer to discriminator
# Minibatch discrimination layer
class MinibatchDiscrimination(layers.Layer):
    def __init__(self, num_kernels, kernel_dim, **kwargs):
        super(MinibatchDiscrimination, self).__init__(**kwargs)
        self.num_kernels = num_kernels
        self.kernel_dim = kernel_dim

    def build(self, input_shape):
        self.kernel = self.add_weight(
            name='kernel',
            shape=(input_shape[1], self.num_kernels * self.kernel_dim),
            initializer='glorot_uniform')

    def call(self, inputs):
        # Compute activations
        activations = tf.matmul(inputs, self.kernel)
        activations = tf.reshape(activations, (-1, self.num_kernels, self.kernel_dim))

        # Compute L1 distance between samples
        diffs = tf.expand_dims(activations, 3) - tf.expand_dims(tf.transpose(activations, [1, 2, 0]), 0)
        abs_diffs = tf.reduce_sum(tf.abs(diffs), axis=2)

        # Compute minibatch features
        minibatch_features = tf.reduce_sum(tf.exp(-abs_diffs), axis=2)

        return tf.concat([inputs, minibatch_features], axis=1)

Training Instability

  • Problem: Unstable training, oscillations
  • Solution: Use Wasserstein loss, gradient penalty
  • Example: WGAN-GP implementation
# WGAN with gradient penalty
class WGANGP:
    def __init__(self, generator, discriminator):
        self.generator = generator
        self.discriminator = discriminator
        self.gan = self._build_gan()

    def _build_gan(self):
        """Build the combined GAN model"""
        # Freeze discriminator during generator training
        self.discriminator.trainable = False

        # GAN model
        model = tf.keras.Sequential([
            self.generator,
            self.discriminator
        ])

        return model

    def gradient_penalty(self, real_data, fake_data):
        """Calculate gradient penalty"""
        # Create interpolated data
        alpha = tf.random.uniform([real_data.shape[0], 1, 1, 1], 0., 1.)
        interpolated = alpha * real_data + (1 - alpha) * fake_data

        with tf.GradientTape() as tape:
            tape.watch(interpolated)
            # Get discriminator output for interpolated data
            pred = self.discriminator(interpolated)

        # Calculate gradients
        gradients = tape.gradient(pred, interpolated)
        gradients_norm = tf.sqrt(tf.reduce_sum(tf.square(gradients), axis=[1, 2, 3]))
        penalty = tf.reduce_mean((gradients_norm - 1.)**2)

        return penalty

    def train(self, X_train, epochs, batch_size, latent_dim, n_critic=5, gp_weight=10):
        """Train the WGAN-GP"""
        # Adversarial ground truths
        valid = np.ones((batch_size, 1))

        for epoch in range(epochs):
            for _ in range(n_critic):
                # Train discriminator (critic)
                # Select random batch of real data
                idx = np.random.randint(0, X_train.shape[0], batch_size)
                real_data = X_train[idx]

                # Generate fake data
                noise = np.random.normal(0, 1, (batch_size, latent_dim))
                fake_data = self.generator.predict(noise)

                # Train critic
                d_loss_real = self.discriminator.train_on_batch(real_data, valid)
                d_loss_fake = self.discriminator.train_on_batch(fake_data, -valid)

                # Calculate gradient penalty
                gp = self.gradient_penalty(real_data, fake_data)

                # Total discriminator loss
                d_loss = 0.5 * np.add(d_loss_fake, d_loss_real) + gp_weight * gp

            # Train generator
            noise = np.random.normal(0, 1, (batch_size, latent_dim))
            g_loss = self.gan.train_on_batch(noise, valid)

            # Print progress
            if epoch % 100 == 0:
                print(f"{epoch} [D loss: {d_loss}] [G loss: {g_loss}] [GP: {gp}]")

Vanishing Gradients

  • Problem: Discriminator becomes too strong, generator gets no gradient
  • Solution: Use label smoothing, alternative loss functions
  • Example: Label smoothing implementation
# Label smoothing for GAN training
def smooth_positive_labels(y):
    """Apply label smoothing to positive labels"""
    return y - 0.3 + (np.random.random(y.shape) * 0.5)

def smooth_negative_labels(y):
    """Apply label smoothing to negative labels"""
    return y + np.random.random(y.shape) * 0.3

GAN Applications

Image Generation

# Image generation with DCGAN
import matplotlib.pyplot as plt

# Create and train DCGAN
latent_dim = 100
generator = create_dcgan_generator(latent_dim)
discriminator = create_dcgan_discriminator()
gan = GAN(generator, discriminator)

# Train on CIFAR-10 dataset
(train_images, _), (_, _) = tf.keras.datasets.cifar10.load_data()
train_images = (train_images.astype('float32') - 127.5) / 127.5  # Normalize to [-1, 1]

gan.train(train_images, epochs=10000, batch_size=64, latent_dim=latent_dim)

# Generate images
n = 10  # Number of images to generate
noise = np.random.normal(0, 1, (n, latent_dim))
generated_images = generator.predict(noise)

# Display generated images
plt.figure(figsize=(20, 4))
for i in range(n):
    ax = plt.subplot(2, n//2, i + 1)
    plt.imshow((generated_images[i] + 1) / 2)  # Scale back to [0, 1]
    plt.axis('off')
plt.suptitle('Generated Images')
plt.show()

Image-to-Image Translation

# Image-to-image translation with CycleGAN (conceptual)
def train_cyclegan(cyclegan, X_A, X_B, epochs, batch_size):
    """Train CycleGAN on two domains"""
    # Adversarial ground truths
    valid = np.ones((batch_size, 1))
    fake = np.zeros((batch_size, 1))

    for epoch in range(epochs):
        # Select random batch
        idx = np.random.randint(0, X_A.shape[0], batch_size)
        real_A = X_A[idx]
        real_B = X_B[idx]

        # Generate fake images
        fake_B = cyclegan.g_AB.predict(real_A)
        fake_A = cyclegan.g_BA.predict(real_B)

        # Train discriminators
        dA_loss_real = cyclegan.d_A.train_on_batch(real_A, valid)
        dA_loss_fake = cyclegan.d_A.train_on_batch(fake_A, fake)
        dA_loss = 0.5 * np.add(dA_loss_real, dA_loss_fake)

        dB_loss_real = cyclegan.d_B.train_on_batch(real_B, valid)
        dB_loss_fake = cyclegan.d_B.train_on_batch(fake_B, fake)
        dB_loss = 0.5 * np.add(dB_loss_real, dB_loss_fake)

        # Train generators
        # Adversarial loss
        g_loss = cyclegan.combined.train_on_batch(
            [real_A, real_B],
            [valid, valid, real_A, real_B, real_A, real_B])

        # Print progress
        if epoch % 100 == 0:
            print(f"{epoch} [D loss: {0.5*(dA_loss + dB_loss)}] [G loss: {g_loss[0]}]")

Super-Resolution

# Super-resolution GAN (SRGAN) implementation
def create_srgan_generator():
    """Create a generator for super-resolution"""
    # Low resolution input
    lr_input = layers.Input(shape=(64, 64, 3))

    # Pre-residual block
    x = layers.Conv2D(64, (9, 9), padding='same')(lr_input)
    x = layers.PReLU(shared_axes=[1, 2])(x)

    # Store residual for skip connection
    residual = x

    # B residual blocks
    for _ in range(16):
        x = layers.Conv2D(64, (3, 3), padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.PReLU(shared_axes=[1, 2])(x)
        x = layers.Conv2D(64, (3, 3), padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.Add()([x, residual])
        residual = x

    # Upsampling
    x = layers.Conv2D(256, (3, 3), padding='same')(x)
    x = layers.UpSampling2D(size=(2, 2))(x)
    x = layers.PReLU(shared_axes=[1, 2])(x)

    x = layers.Conv2D(256, (3, 3), padding='same')(x)
    x = layers.UpSampling2D(size=(2, 2))(x)
    x = layers.PReLU(shared_axes=[1, 2])(x)

    # Output
    hr_output = layers.Conv2D(3, (9, 9), padding='same', activation='tanh')(x)

    return models.Model(lr_input, hr_output)

Style Transfer

# Style transfer with GAN (conceptual)
class StyleGAN:
    def __init__(self, content_shape, style_shape):
        self.content_shape = content_shape
        self.style_shape = style_shape

        # Create networks
        self.encoder = self._create_encoder()
        self.decoder = self._create_decoder()
        self.discriminator = self._create_discriminator()

    def _create_encoder(self):
        """Create encoder network"""
        content_input = layers.Input(shape=self.content_shape)
        style_input = layers.Input(shape=self.style_shape)

        # Process content
        x = layers.Conv2D(32, (7, 7), padding='same')(content_input)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        # Process style
        y = layers.Conv2D(32, (7, 7), padding='same')(style_input)
        y = layers.InstanceNormalization()(y)
        y = layers.Activation('relu')(y)

        # Combine features
        combined = layers.Concatenate()([x, y])

        return models.Model([content_input, style_input], combined)

    def _create_decoder(self):
        """Create decoder network"""
        inputs = layers.Input(shape=(None, None, 64))

        # Upsample
        x = layers.Conv2D(64, (3, 3), padding='same')(inputs)
        x = layers.UpSampling2D(size=(2, 2))(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        x = layers.Conv2D(32, (3, 3), padding='same')(x)
        x = layers.UpSampling2D(size=(2, 2))(x)
        x = layers.InstanceNormalization()(x)
        x = layers.Activation('relu')(x)

        # Output
        outputs = layers.Conv2D(3, (7, 7), padding='same', activation='tanh')(x)

        return models.Model(inputs, outputs)

    def transfer_style(self, content_image, style_image):
        """Transfer style from style_image to content_image"""
        # Encode
        features = self.encoder.predict([content_image, style_image])

        # Decode
        stylized = self.decoder.predict(features)

        return stylized

GAN Research

Key Papers

  1. "Generative Adversarial Nets" (Goodfellow et al., 2014)
    • Introduced the GAN framework
    • Demonstrated adversarial training
    • Foundation for GAN research
  2. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (Radford et al., 2015)
    • Introduced DCGAN architecture
    • Demonstrated stable training with CNNs
    • Foundation for convolutional GANs
  3. "Wasserstein GAN" (Arjovsky et al., 2017)
    • Introduced Wasserstein loss
    • Improved training stability
    • Addressed mode collapse
  4. "Improved Training of Wasserstein GANs" (Gulrajani et al., 2017)
    • Introduced gradient penalty
    • Further improved WGAN training
    • Foundation for WGAN-GP
  5. "CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" (Zhu et al., 2017)
    • Introduced CycleGAN
    • Enabled unpaired image translation
    • Foundation for style transfer
  6. "Progressive Growing of GANs for Improved Quality, Stability, and Variation" (Karras et al., 2017)
    • Introduced progressive growing
    • Demonstrated high-resolution image generation
    • Foundation for StyleGAN

Emerging Research Directions

  • Stable Training: More stable GAN training methods
  • High-Resolution Generation: Generating very high-resolution images
  • Disentangled Representations: Learning independent latent factors
  • Conditional Generation: Controllable generation with conditioning
  • Few-Shot Learning: GANs that learn from few examples
  • 3D Generation: Generating 3D objects and scenes
  • Video Generation: Generating realistic videos
  • Neuromorphic GANs: Brain-inspired GAN architectures
  • Quantum GANs: GANs for quantum computing
  • Explainable GANs: More interpretable GAN architectures
  • Energy-Efficient GANs: Green computing approaches
  • Multimodal GANs: GANs for multiple data modalities
  • Continual Learning GANs: GANs that learn continuously
  • Self-Supervised GANs: GANs for self-supervised learning

GAN Best Practices

Implementation Guidelines

AspectRecommendationNotes
ArchitectureUse DCGAN architecture for imagesGood starting point for image GANs
GeneratorUse batch normalizationStabilizes training
DiscriminatorUse leaky ReLU, dropoutPrevents mode collapse
Loss FunctionStart with binary cross-entropyConsider Wasserstein for stability
OptimizerAdam with low learning ratelr=0.0002, beta1=0.5 often works well
Batch Size32-128 depending on GPU memoryLarger batches for stability
Latent Dimension100-512 dimensionsBalance expressiveness and complexity
NormalizationNormalize data to -1, 1Works well with tanh activation
Training RatioTrain discriminator more than generatorn_critic=5 for WGAN
MonitoringMonitor both lossesShould both decrease over time

Common Pitfalls and Solutions

PitfallSolutionExample
Mode CollapseUse minibatch discrimination, feature matchingAdd minibatch layer to discriminator
Training InstabilityUse Wasserstein loss, gradient penaltySwitch to WGAN-GP
Vanishing GradientsUse label smoothing, alternative lossesApply label smoothing
Poor Generation QualityIncrease model capacity, train longerAdd more layers to generator
Slow ConvergenceAdjust learning rate, use momentumUse Adam optimizer with lr=0.0002
Discriminator Too StrongTrain generator more, reduce discriminator capacityReduce discriminator layers
Generator Too WeakIncrease generator capacityAdd more layers to generator
OverfittingAdd regularization, early stoppingAdd dropout with p=0.3

GAN Evaluation Metrics

Quantitative Metrics

MetricDescriptionFormula/Implementation
Inception Score (IS)Measures quality and diversity of generated imagesexp(E[KL(p(y
Fréchet Inception Distance (FID)Measures similarity between real and generated distributions
Kernel Inception Distance (KID)Alternative to FID, more robust to small sample sizesMMD² between real and generated features
Precision and RecallMeasures quality (precision) and diversity (recall)Based on manifold estimation
Perceptual Path LengthMeasures smoothness of latent spaceE[
Linear SeparabilityMeasures disentanglement of latent factorsAccuracy of linear classifier on latent factors

Qualitative Evaluation

  • Visual Inspection: Manually examine generated samples
  • Nearest Neighbors: Compare generated samples to training data
  • Interpolation: Test smoothness of latent space interpolation
  • Attribute Manipulation: Test controllability of generation
  • User Studies: Human evaluation of generated samples
# Inception Score implementation
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.applications.inception_v3 import preprocess_input
import numpy as np
from scipy.stats import entropy

def calculate_inception_score(images, n_split=10, eps=1E-16):
    """Calculate Inception Score for generated images"""
    # Load InceptionV3 model
    model = InceptionV3()

    # Preprocess images
    images = preprocess_input(images)

    # Get predictions
    preds = model.predict(images)

    # Split into groups
    split_scores = []
    n_part = images.shape[0] // n_split

    for i in range(n_split):
        ix_start, ix_end = i * n_part, (i+1) * n_part
        p_yx = preds[ix_start:ix_end]
        p_y = np.expand_dims(p_yx.mean(axis=0), 0)
        kl_d = p_yx * (np.log(p_yx + eps) - np.log(p_y + eps))
        sum_kl_d = kl_d.sum(axis=1)
        avg_kl_d = np.mean(sum_kl_d)
        split_scores.append(np.exp(avg_kl_d))

    is_score = np.mean(split_scores)
    is_std = np.std(split_scores)

    return is_score, is_std

# Fréchet Inception Distance implementation
def calculate_fid(real_images, generated_images):
    """Calculate Fréchet Inception Distance"""
    # Load InceptionV3 model
    model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3))

    # Preprocess images
    real_images = preprocess_input(real_images)
    generated_images = preprocess_input(generated_images)

    # Get features
    real_features = model.predict(real_images)
    generated_features = model.predict(generated_images)

    # Calculate mean and covariance
    mu1, sigma1 = real_features.mean(axis=0), np.cov(real_features, rowvar=False)
    mu2, sigma2 = generated_features.mean(axis=0), np.cov(generated_features, rowvar=False)

    # Calculate FID
    ssdiff = np.sum((mu1 - mu2)**2.0)
    covmean = sqrtm(sigma1.dot(sigma2))

    if np.iscomplexobj(covmean):
        covmean = covmean.real

    fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)

    return fid

GAN in Practice

Case Study: Face Generation

# Face generation with DCGAN
import tensorflow as tf
from tensorflow.keras import layers, models, callbacks
import numpy as np
import matplotlib.pyplot as plt

# Load CelebA dataset (conceptual)
# In practice, you would load actual face images
def load_celeba():
    # This is a placeholder - in practice you would load actual images
    (x_train, _), (_, _) = tf.keras.datasets.cifar10.load_data()
    x_train = (x_train.astype('float32') - 127.5) / 127.5  # Normalize to [-1, 1]
    return x_train

# Create DCGAN
latent_dim = 100

generator = create_dcgan_generator(latent_dim)
discriminator = create_dcgan_discriminator()
gan = GAN(generator, discriminator)

# Train
X_train = load_celeba()
gan.train(X_train, epochs=20000, batch_size=64, latent_dim=latent_dim)

# Generate faces
n = 10  # Number of faces to generate
noise = np.random.normal(0, 1, (n, latent_dim))
generated_faces = generator.predict(noise)

# Display generated faces
plt.figure(figsize=(20, 4))
for i in range(n):
    ax = plt.subplot(2, n//2, i + 1)
    plt.imshow((generated_faces[i] + 1) / 2)  # Scale back to [0, 1]
    plt.axis('off')
plt.suptitle('Generated Faces')
plt.show()

# Latent space interpolation
def interpolate_faces(generator, n=10):
    """Interpolate between two random faces"""
    # Generate two random latent vectors
    z1 = np.random.normal(0, 1, (1, latent_dim))
    z2 = np.random.normal(0, 1, (1, latent_dim))

    # Create interpolation
    interpolated = []
    for alpha in np.linspace(0, 1, n):
        z = alpha * z1 + (1 - alpha) * z2
        generated = generator.predict(z)
        interpolated.append(generated[0])

    # Display interpolation
    plt.figure(figsize=(20, 2))
    for i in range(n):
        ax = plt.subplot(1, n, i + 1)
        plt.imshow((interpolated[i] + 1) / 2)
        plt.axis('off')
    plt.suptitle('Latent Space Interpolation')
    plt.show()

interpolate_faces(generator)

Case Study: Art Generation

# Art generation with StyleGAN (conceptual)
class ArtGAN:
    def __init__(self):
        self.latent_dim = 512
        self.generator = self._create_stylegan_generator()
        self.discriminator = self._create_stylegan_discriminator()

    def _create_stylegan_generator(self):
        """Create StyleGAN generator (simplified)"""
        # Mapping network
        z_input = layers.Input(shape=(self.latent_dim,))
        x = layers.Dense(512, activation='leaky_relu')(z_input)
        x = layers.Dense(512, activation='leaky_relu')(x)
        x = layers.Dense(512, activation='leaky_relu')(x)
        w = layers.Dense(512)(x)

        # Synthesis network
        inputs = layers.Input(shape=(1, 1, 512))
        x = layers.Conv2DTranspose(512, (4, 4), use_bias=False)(inputs)
        x = layers.BatchNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        # Style modulation
        style = layers.Dense(512)(w)
        style = layers.Reshape((1, 1, 512))(style)
        x = x * style + x

        # Upsample blocks
        for i in range(4):
            x = layers.UpSampling2D()(x)
            x = layers.Conv2D(512 // (2**i), (3, 3), padding='same', use_bias=False)(x)
            x = layers.BatchNormalization()(x)
            x = layers.LeakyReLU(0.2)(x)

            # Add noise
            noise = layers.GaussianNoise(0.1)(x)
            x = x + noise

        # Output
        outputs = layers.Conv2D(3, (1, 1), activation='tanh')(x)

        return models.Model([z_input, inputs], outputs)

    def _create_stylegan_discriminator(self):
        """Create StyleGAN discriminator"""
        inputs = layers.Input(shape=(64, 64, 3))
        x = layers.Conv2D(64, (4, 4), strides=2, padding='same')(inputs)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(128, (4, 4), strides=2, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(256, (4, 4), strides=2, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Conv2D(512, (4, 4), strides=2, padding='same')(x)
        x = layers.BatchNormalization()(x)
        x = layers.LeakyReLU(0.2)(x)

        x = layers.Flatten()(x)
        outputs = layers.Dense(1)(x)

        return models.Model(inputs, outputs)

    def generate_art(self, n_samples):
        """Generate art samples"""
        z = np.random.normal(0, 1, (n_samples, self.latent_dim))
        constant_input = np.random.normal(0, 1, (n_samples, 1, 1, 512))
        generated = self.generator.predict([z, constant_input])
        return (generated + 1) / 2  # Scale to [0, 1]

Future Directions

  • Stable Training: More robust and stable GAN training methods
  • High-Fidelity Generation: Generating extremely high-resolution images
  • Disentangled Representations: Learning independent, interpretable factors
  • Controllable Generation: Precise control over generated outputs
  • Few-Shot Generation: GANs that learn from few examples
  • 3D Generation: Generating 3D objects and scenes
  • Video Generation: Generating realistic videos
  • Neuromorphic GANs: Brain-inspired GAN architectures
  • Quantum GANs: GANs for quantum computing
  • Explainable GANs: More interpretable GAN architectures
  • Energy-Efficient GANs: Green computing approaches
  • Multimodal GANs: GANs for multiple data modalities
  • Continual Learning GANs: GANs that learn continuously
  • Self-Supervised GANs: GANs for self-supervised learning
  • Ethical GANs: GANs with built-in ethical constraints

External Resources