JAX
High-performance numerical computing library for Python with automatic differentiation and GPU/TPU acceleration.
What is JAX?
JAX is a Python library designed for high-performance numerical computing and machine learning research. Developed by Google, JAX combines the ease of use of NumPy with powerful features like automatic differentiation, just-in-time compilation, and GPU/TPU acceleration. It's particularly well-suited for developing and training machine learning models, especially in research settings where flexibility and performance are crucial.
Key Concepts
JAX Architecture
graph TD
A[JAX] --> B[NumPy-like API]
A --> C[Automatic Differentiation]
A --> D[Just-In-Time Compilation]
A --> E[Hardware Acceleration]
A --> F[Function Transformations]
A --> G[Parallel Computing]
B --> B1[NumPy Compatibility]
B --> B2[Array Operations]
B --> B3[Linear Algebra]
B --> B4[Random Number Generation]
C --> C1[Forward-mode AD]
C --> C2[Reverse-mode AD]
C --> C3[Gradients]
C --> C4[Jacobians & Hessians]
D --> D1[XLA Compiler]
D --> D2[Performance Optimization]
D --> D3[Device Placement]
D --> D4[Memory Management]
E --> E1[GPU Support]
E --> E2[TPU Support]
E --> E3[Multi-device Computing]
E --> E4[Distributed Computing]
F --> F1[jit - Just-In-Time Compilation]
F --> F2[grad - Automatic Differentiation]
F --> F3[vmap - Vectorization]
F --> F4[pmap - Parallelization]
G --> G1[Single-Program Multiple-Data]
G --> G2[Collective Operations]
G --> G3[Device Communication]
G --> G4[Scalable Computing]
style A fill:#4285F4,stroke:#333
style B fill:#34A853,stroke:#333
style C fill:#EA4335,stroke:#333
style D fill:#FBBC05,stroke:#333
style E fill:#673AB7,stroke:#333
style F fill:#FF6D01,stroke:#333
style G fill:#00BCD4,stroke:#333
Core Components
- NumPy-like API: Familiar interface for numerical computing
- Autograd: Automatic differentiation for gradients
- XLA Compiler: Just-in-time compilation for performance
- GPU/TPU Support: Hardware acceleration
- Function Transformations: Powerful function manipulation
- Parallel Computing: Multi-device and distributed computing
- Random Number Generation: Deterministic random operations
- Array Operations: Efficient array manipulations
- Linear Algebra: Optimized linear algebra operations
- Automatic Vectorization: Automatic batching of operations
Applications
Machine Learning Domains
- Deep Learning: Neural network training and research
- Reinforcement Learning: Policy optimization and Q-learning
- Probabilistic Programming: Bayesian inference and MCMC
- Scientific Computing: Physics simulations and modeling
- Computer Vision: Image processing and analysis
- Natural Language Processing: Language models and embeddings
- Optimization: Gradient-based optimization algorithms
- Numerical Analysis: Differential equations and simulations
- Research: Cutting-edge ML research and experimentation
- Custom Models: Building novel machine learning architectures
Industry Applications
- Research: Academic and industrial ML research
- Healthcare: Medical imaging and predictive modeling
- Finance: Quantitative analysis and risk modeling
- Robotics: Control systems and reinforcement learning
- Autonomous Vehicles: Perception and decision systems
- Drug Discovery: Molecular modeling and simulation
- Climate Science: Weather prediction and climate modeling
- Energy: Smart grid optimization and energy forecasting
- Manufacturing: Predictive maintenance and quality control
- Technology: AI research and development
Implementation
Basic JAX Example
# Basic JAX example
import jax
import jax.numpy as jnp
import numpy as np
import matplotlib.pyplot as plt
print("Basic JAX Example...")
# Check available devices
print("\nAvailable devices:")
print(jax.devices())
# 1. NumPy-like operations
print("\n1. NumPy-like Operations...")
# Create arrays
x_np = np.array([1.0, 2.0, 3.0, 4.0])
x_jax = jnp.array([1.0, 2.0, 3.0, 4.0])
print(f"NumPy array: {x_np}, type: {type(x_np)}")
print(f"JAX array: {x_jax}, type: {type(x_jax)}")
# Basic operations
print("\nBasic operations:")
print(f"Addition: {x_jax + 2}")
print(f"Multiplication: {x_jax * 2}")
print(f"Exponentiation: {jnp.exp(x_jax)}")
print(f"Dot product: {jnp.dot(x_jax, x_jax)}")
# 2. Automatic differentiation
print("\n2. Automatic Differentiation...")
def f(x):
"""A simple function to differentiate"""
return 3 * x**2 + 2 * x + 1
# Compute gradient
dfdx = jax.grad(f)
print(f"f(x) = 3x² + 2x + 1")
print(f"df/dx at x=2: {dfdx(2.0)}")
# Compute second derivative
d2fdx2 = jax.grad(jax.grad(f))
print(f"d²f/dx² at x=2: {d2fdx2(2.0)}")
# Compute value and gradient together
f_and_df = jax.value_and_grad(f)
value, gradient = f_and_df(2.0)
print(f"f(2) = {value}, df/dx at x=2 = {gradient}")
# 3. Just-In-Time compilation
print("\n3. Just-In-Time Compilation...")
@jax.jit
def jitted_function(x):
"""A function that will be JIT compiled"""
return jnp.sin(x) * jnp.cos(x) + jnp.exp(-x**2)
# First call will compile
print("First call (compilation):")
%timeit -n 1 -r 1 jitted_function(x_jax).block_until_ready()
# Subsequent calls will be faster
print("Subsequent calls (compiled):")
%timeit jitted_function(x_jax).block_until_ready()
# Compare with non-jitted
def non_jitted_function(x):
"""Non-jitted version for comparison"""
return jnp.sin(x) * jnp.cos(x) + jnp.exp(-x**2)
print("Non-jitted function:")
%timeit non_jitted_function(x_jax).block_until_ready()
# 4. Vectorization with vmap
print("\n4. Vectorization with vmap...")
def scalar_function(x, y):
"""A scalar function that operates on single values"""
return jnp.sin(x) + jnp.cos(y)
# Vectorize the function
vectorized_function = jax.vmap(scalar_function)
# Create input arrays
x_vec = jnp.array([1.0, 2.0, 3.0, 4.0])
y_vec = jnp.array([0.5, 1.5, 2.5, 3.5])
# Apply vectorized function
result = vectorized_function(x_vec, y_vec)
print(f"Input x: {x_vec}")
print(f"Input y: {y_vec}")
print(f"Result: {result}")
# Compare with manual vectorization
manual_result = jnp.array([scalar_function(x, y) for x, y in zip(x_vec, y_vec)])
print(f"Manual result: {manual_result}")
print(f"Results match: {jnp.allclose(result, manual_result)}")
# 5. Parallelization with pmap
print("\n5. Parallelization with pmap...")
# Check if multiple devices are available
if len(jax.devices()) > 1:
print(f"Multiple devices available: {jax.devices()}")
def parallel_function(x):
"""Function to parallelize across devices"""
return jnp.sin(x) * jnp.cos(x) + jax.lax.psum(x, 'devices')
# Replicate input across devices
x_replicated = jax.device_put_replicated(x_jax, jax.devices())
# Parallelize the function
parallelized_function = jax.pmap(parallel_function, axis_name='devices')
# Run parallel computation
result = parallelized_function(x_replicated)
print(f"Parallel result: {result}")
else:
print("Single device available, parallelization not demonstrated")
# 6. Random number generation
print("\n6. Random Number Generation...")
# JAX random numbers are deterministic and require explicit keys
key = jax.random.PRNGKey(42)
print(f"Random key: {key}")
# Generate random numbers
random_numbers = jax.random.normal(key, (5,))
print(f"Random normal numbers: {random_numbers}")
# Split key for multiple operations
key1, key2 = jax.random.split(key)
random_uniform = jax.random.uniform(key1, (3,))
random_int = jax.random.randint(key2, (3,), 0, 10)
print(f"Random uniform numbers: {random_uniform}")
print(f"Random integers: {random_int}")
# 7. Linear algebra operations
print("\n7. Linear Algebra Operations...")
# Create matrices
A = jnp.array([[1.0, 2.0], [3.0, 4.0]])
B = jnp.array([[5.0, 6.0], [7.0, 8.0]])
print(f"Matrix A:\n{A}")
print(f"Matrix B:\n{B}")
# Matrix operations
print("\nMatrix operations:")
print(f"Matrix multiplication:\n{jnp.dot(A, B)}")
print(f"Matrix addition:\n{A + B}")
print(f"Matrix inverse:\n{jnp.linalg.inv(A)}")
print(f"Eigenvalues of A: {jnp.linalg.eigvals(A)}")
print(f"Determinant of A: {jnp.linalg.det(A)}")
# 8. Array manipulation
print("\n8. Array Manipulation...")
# Create a 2D array
arr = jnp.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"Original array:\n{arr}")
# Array operations
print("\nArray operations:")
print(f"Reshape: {arr.reshape(9,)}")
print(f"Transpose:\n{arr.T}")
print(f"Slice: {arr[1:, 1:]}")
print(f"Concatenate: {jnp.concatenate([arr, arr], axis=0)}")
print(f"Stack: {jnp.stack([arr, arr], axis=0)}")
# 9. GPU acceleration
print("\n9. GPU Acceleration...")
# Check if GPU is available
if jax.devices()[0].platform == 'gpu':
print("GPU is available")
# Create a large array
large_array = jax.random.normal(key, (1000, 1000))
# Time GPU computation
def gpu_computation(x):
return jnp.sin(x) @ jnp.cos(x.T)
# JIT compile for GPU
gpu_computation_jitted = jax.jit(gpu_computation)
# Time the computation
print("Timing GPU computation...")
%timeit gpu_computation_jitted(large_array).block_until_ready()
else:
print("GPU not available, using CPU")
print("GPU acceleration would be demonstrated here with actual GPU")
# 10. Neural network example
print("\n10. Neural Network Example...")
def simple_nn(params, x):
"""A simple neural network with one hidden layer"""
w1, b1, w2, b2 = params
hidden = jnp.tanh(jnp.dot(x, w1) + b1)
return jnp.dot(hidden, w2) + b2
# Initialize parameters
key1, key2, key3, key4 = jax.random.split(key, 4)
input_dim = 2
hidden_dim = 5
output_dim = 1
params = [
jax.random.normal(key1, (input_dim, hidden_dim)), # w1
jax.random.normal(key2, (hidden_dim,)), # b1
jax.random.normal(key3, (hidden_dim, output_dim)), # w2
jax.random.normal(key4, (output_dim,)) # b2
]
# Create some input data
x = jnp.array([[0.5, -0.3], [1.2, 0.8], [-0.7, 0.1]])
# Forward pass
output = simple_nn(params, x)
print(f"Input: {x}")
print(f"Output: {output}")
# Compute gradient
grad_fn = jax.grad(lambda p, x: jnp.sum(simple_nn(p, x)))
grads = grad_fn(params, x)
print("\nGradients:")
for i, grad in enumerate(grads):
print(f"Layer {i+1} gradient shape: {grad.shape}")
Automatic Differentiation Example
# Automatic differentiation example with JAX
import jax
import jax.numpy as jnp
import matplotlib.pyplot as plt
print("\nAutomatic Differentiation Example...")
# 1. Basic differentiation
print("1. Basic Differentiation...")
def quadratic(x):
"""Quadratic function: f(x) = ax² + bx + c"""
a, b, c = 2.0, 3.0, 1.0
return a * x**2 + b * x + c
# Compute first derivative
grad_quadratic = jax.grad(quadratic)
x_vals = jnp.linspace(-5, 5, 100)
y_vals = quadratic(x_vals)
dy_vals = grad_quadratic(x_vals)
print(f"f(x) = 2x² + 3x + 1")
print(f"f'(x) = {grad_quadratic(1.0)} at x=1.0")
# Plot function and its derivative
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(x_vals, y_vals)
plt.title('Quadratic Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.subplot(1, 2, 2)
plt.plot(x_vals, dy_vals)
plt.title('Derivative of Quadratic Function')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.tight_layout()
plt.show()
# 2. Higher-order derivatives
print("\n2. Higher-Order Derivatives...")
def cubic(x):
"""Cubic function: f(x) = ax³ + bx² + cx + d"""
a, b, c, d = 1.0, 2.0, 3.0, 4.0
return a * x**3 + b * x**2 + c * x + d
# Compute first, second, and third derivatives
grad_cubic = jax.grad(cubic)
hessian_cubic = jax.grad(jax.grad(cubic))
third_deriv_cubic = jax.grad(jax.grad(jax.grad(cubic)))
print(f"f(x) = x³ + 2x² + 3x + 4")
print(f"f'(x) = {grad_cubic(1.0)} at x=1.0")
print(f"f''(x) = {hessian_cubic(1.0)} at x=1.0")
print(f"f'''(x) = {third_deriv_cubic(1.0)} at x=1.0")
# Plot function and derivatives
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
plt.plot(x_vals, cubic(x_vals))
plt.title('Cubic Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.subplot(2, 2, 2)
plt.plot(x_vals, grad_cubic(x_vals))
plt.title('First Derivative')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.subplot(2, 2, 3)
plt.plot(x_vals, hessian_cubic(x_vals))
plt.title('Second Derivative')
plt.xlabel('x')
plt.ylabel('f\'\'(x)')
plt.subplot(2, 2, 4)
plt.plot(x_vals, third_deriv_cubic(x_vals))
plt.title('Third Derivative')
plt.xlabel('x')
plt.ylabel('f\'\'\'(x)')
plt.tight_layout()
plt.show()
# 3. Partial derivatives
print("\n3. Partial Derivatives...")
def multivariate_function(x, y):
"""Multivariate function: f(x,y) = x²y + y²sin(x)"""
return x**2 * y + y**2 * jnp.sin(x)
# Compute partial derivatives
grad_multivariate = jax.grad(multivariate_function, argnums=(0, 1))
partial_x, partial_y = grad_multivariate(1.0, 2.0)
print(f"f(x,y) = x²y + y²sin(x)")
print(f"∂f/∂x at (1,2): {partial_x}")
print(f"∂f/∂y at (1,2): {partial_y}")
# Compute Jacobian
jacobian_fn = jax.jacobian(multivariate_function)
jacobian = jacobian_fn(1.0, 2.0)
print(f"Jacobian at (1,2): {jacobian}")
# 4. Gradient of vector-valued functions
print("\n4. Gradient of Vector-Valued Functions...")
def vector_valued_function(x):
"""Vector-valued function: f(x) = [x², sin(x), exp(-x)]"""
return jnp.array([x**2, jnp.sin(x), jnp.exp(-x)])
# Compute Jacobian
jacobian_vector = jax.jacobian(vector_valued_function)
x_test = 1.0
jacobian = jacobian_vector(x_test)
print(f"f(x) = [x², sin(x), exp(-x)]")
print(f"Jacobian at x={x_test}:")
print(jacobian)
# 5. Hessian computation
print("\n5. Hessian Computation...")
def rosenbrock(x):
"""Rosenbrock function: f(x,y) = (a-x)² + b(y-x²)²"""
a, b = 1.0, 100.0
return (a - x[0])**2 + b * (x[1] - x[0]**2)**2
# Compute gradient
grad_rosenbrock = jax.grad(rosenbrock)
# Compute Hessian
hessian_rosenbrock = jax.hessian(rosenbrock)
x_test = jnp.array([1.5, 1.5])
gradient = grad_rosenbrock(x_test)
hessian = hessian_rosenbrock(x_test)
print(f"Rosenbrock function at {x_test}: {rosenbrock(x_test)}")
print(f"Gradient at {x_test}: {gradient}")
print(f"Hessian at {x_test}:")
print(hessian)
# 6. Differentiating through control flow
print("\n6. Differentiating Through Control Flow...")
@jax.jit
def piecewise_function(x):
"""Piecewise function with control flow"""
return jax.lax.cond(x < 0,
lambda x: x**2, # x < 0: x²
lambda x: jnp.sin(x)) # x ≥ 0: sin(x)
# Compute gradient
grad_piecewise = jax.grad(piecewise_function)
x_neg = -1.0
x_pos = 1.0
print(f"f(x) = x² for x < 0, sin(x) for x ≥ 0")
print(f"f'({x_neg}) = {grad_piecewise(x_neg)}")
print(f"f'({x_pos}) = {grad_piecewise(x_pos)}")
# Plot piecewise function and its derivative
x_vals_neg = jnp.linspace(-5, 0, 50)
x_vals_pos = jnp.linspace(0, 5, 50)
x_vals_all = jnp.concatenate([x_vals_neg, x_vals_pos])
y_vals = jax.vmap(piecewise_function)(x_vals_all)
dy_vals = jax.vmap(grad_piecewise)(x_vals_all)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(x_vals_all, y_vals)
plt.title('Piecewise Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.subplot(1, 2, 2)
plt.plot(x_vals_all, dy_vals)
plt.title('Derivative of Piecewise Function')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.tight_layout()
plt.show()
# 7. Differentiating through loops
print("\n7. Differentiating Through Loops...")
def cumulative_sum(x):
"""Cumulative sum with a loop"""
def body_fun(i, val):
return val + x[i]
return jax.lax.fori_loop(0, len(x), body_fun, 0.0)
# Vectorize the function
cumulative_sum_vmap = jax.vmap(cumulative_sum)
# Create input
x = jnp.array([1.0, 2.0, 3.0, 4.0])
result = cumulative_sum_vmap(x.reshape(1, -1))[0]
print(f"Input: {x}")
print(f"Cumulative sum: {result}")
# Compute gradient
grad_cumulative = jax.grad(lambda x: jnp.sum(cumulative_sum(x)))
gradient = grad_cumulative(x)
print(f"Gradient: {gradient}")
# 8. Custom gradient functions
print("\n8. Custom Gradient Functions...")
@jax.custom_vjp
def custom_function(x):
"""Function with custom gradient"""
return jnp.where(x > 0, jnp.log(x + 1), -jnp.log(-x + 1))
# Define forward and backward passes
def custom_forward(x):
"""Forward pass"""
return custom_function(x), x # Return primal and residual
def custom_backward(res, g):
"""Backward pass"""
x = res
grad = g * jnp.where(x > 0, 1/(x + 1), 1/(x - 1))
return (grad,)
custom_function.defvjp(custom_forward, custom_backward)
# Test the custom function
x_test = jnp.array([-2.0, -0.5, 0.5, 2.0])
y_test = custom_function(x_test)
grad_test = jax.grad(custom_function)(x_test)
print(f"Input: {x_test}")
print(f"Output: {y_test}")
print(f"Gradient: {grad_test}")
# Plot custom function and its gradient
x_vals = jnp.linspace(-3, 3, 100)
y_vals = jax.vmap(custom_function)(x_vals)
dy_vals = jax.vmap(jax.grad(custom_function))(x_vals)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(x_vals, y_vals)
plt.title('Custom Function')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.subplot(1, 2, 2)
plt.plot(x_vals, dy_vals)
plt.title('Gradient of Custom Function')
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.tight_layout()
plt.show()
Just-In-Time Compilation Example
# Just-In-Time compilation example with JAX
import jax
import jax.numpy as jnp
import time
import matplotlib.pyplot as plt
print("\nJust-In-Time Compilation Example...")
# 1. Basic JIT example
print("1. Basic JIT Example...")
def slow_function(x):
"""A function that will benefit from JIT compilation"""
result = 0.0
for i in range(100):
result += jnp.sin(x) * jnp.cos(x) + jnp.exp(-x**2) + jnp.tanh(x)
return result
# JIT compile the function
fast_function = jax.jit(slow_function)
# Create input
x = jnp.array([1.0, 2.0, 3.0, 4.0])
# First call (compilation)
print("First call (compilation):")
start_time = time.time()
result1 = fast_function(x).block_until_ready()
compilation_time = time.time() - start_time
print(f"Result: {result1}")
print(f"Compilation time: {compilation_time:.6f} seconds")
# Subsequent calls (compiled)
print("\nSubsequent calls (compiled):")
start_time = time.time()
result2 = fast_function(x).block_until_ready()
execution_time = time.time() - start_time
print(f"Result: {result2}")
print(f"Execution time: {execution_time:.6f} seconds")
# Compare with non-jitted
print("\nNon-jitted function:")
start_time = time.time()
result3 = slow_function(x).block_until_ready()
non_jitted_time = time.time() - start_time
print(f"Result: {result3}")
print(f"Execution time: {non_jitted_time:.6f} seconds")
print(f"\nSpeedup: {non_jitted_time/execution_time:.2f}x")
# 2. JIT with control flow
print("\n2. JIT with Control Flow...")
@jax.jit
def jitted_control_flow(x):
"""JITted function with control flow"""
return jax.lax.cond(x > 0,
lambda x: jnp.sin(x) + jnp.cos(x), # x > 0
lambda x: x**2 + jnp.exp(-x**2)) # x ≤ 0
# Test the function
x_pos = jnp.array(1.0)
x_neg = jnp.array(-1.0)
print(f"jitted_control_flow({x_pos}): {jitted_control_flow(x_pos)}")
print(f"jitted_control_flow({x_neg}): {jitted_control_flow(x_neg)}")
# 3. JIT with loops
print("\n3. JIT with Loops...")
@jax.jit
def jitted_loop(n, x):
"""JITted function with a loop"""
def body_fun(i, val):
return val + jnp.sin(x) * i
return jax.lax.fori_loop(0, n, body_fun, 0.0)
# Test the function
n = 100
x = jnp.array(0.5)
result = jitted_loop(n, x)
print(f"jitted_loop({n}, {x}): {result}")
# 4. JIT compilation stages
print("\n4. JIT Compilation Stages...")
def complex_function(x):
"""A more complex function to demonstrate compilation stages"""
# Various operations that will be optimized
a = jnp.sin(x) * jnp.cos(x)
b = jnp.exp(-x**2)
c = jnp.tanh(x)
d = jnp.log(jnp.abs(x) + 1e-6)
e = jnp.sqrt(jnp.abs(x) + 1e-6)
# Some control flow
result = jax.lax.cond(x > 0,
lambda: a + b + c,
lambda: d + e)
return result
# JIT compile with stage tracking
print("Compiling complex_function...")
fast_complex = jax.jit(complex_function)
# First call (compilation)
print("First call (compilation):")
result = fast_complex(jnp.array(1.0)).block_until_ready()
print(f"Result: {result}")
# 5. JIT performance comparison
print("\n5. JIT Performance Comparison...")
# Create a range of input sizes
sizes = [10, 100, 1000, 10000]
jitted_times = []
non_jitted_times = []
def compute_function(x):
"""Function to benchmark"""
return jnp.sum(jnp.sin(x) * jnp.cos(x) + jnp.exp(-x**2))
# JIT compile
jitted_compute = jax.jit(compute_function)
for size in sizes:
x = jnp.linspace(0, 10, size)
# Time jitted version
start_time = time.time()
result = jitted_compute(x).block_until_ready()
jitted_times.append(time.time() - start_time)
# Time non-jitted version
start_time = time.time()
result = compute_function(x).block_until_ready()
non_jitted_times.append(time.time() - start_time)
print(f"Size {size}: JIT={jitted_times[-1]:.6f}s, Non-JIT={non_jitted_times[-1]:.6f}s")
# Plot performance comparison
plt.figure(figsize=(10, 6))
plt.plot(sizes, jitted_times, label='JIT Compiled', marker='o')
plt.plot(sizes, non_jitted_times, label='Non-JIT', marker='o')
plt.title('JIT vs Non-JIT Performance')
plt.xlabel('Input Size')
plt.ylabel('Time (seconds)')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.show()
# 6. JIT with static arguments
print("\n6. JIT with Static Arguments...")
@jax.jit
def jitted_with_static(x, n):
"""Function with static argument"""
def body_fun(i, val):
return val + jnp.sin(x) * i
return jax.lax.fori_loop(0, n, body_fun, 0.0)
# Test with different static arguments
print("Testing with static argument n=100:")
result1 = jitted_with_static(jnp.array(0.5), 100)
print(f"Result: {result1}")
print("\nTesting with static argument n=200:")
result2 = jitted_with_static(jnp.array(0.5), 200)
print(f"Result: {result2}")
# Note: Changing static arguments triggers recompilation
print("\nNote: Changing static arguments triggers recompilation")
# 7. JIT compilation cache
print("\n7. JIT Compilation Cache...")
# JAX caches compiled functions
print("JAX caches compiled functions to avoid recompilation")
# Create multiple functions with same signature
@jax.jit
def function1(x):
return jnp.sin(x) + jnp.cos(x)
@jax.jit
def function2(x):
return jnp.sin(x) * jnp.cos(x)
# First calls will compile
print("First calls (compilation):")
start_time = time.time()
result1 = function1(jnp.array(1.0)).block_until_ready()
time1 = time.time() - start_time
start_time = time.time()
result2 = function2(jnp.array(1.0)).block_until_ready()
time2 = time.time() - start_time
print(f"function1 compilation time: {time1:.6f}s")
print(f"function2 compilation time: {time2:.6f}s")
# Subsequent calls will be faster
print("\nSubsequent calls (cached):")
start_time = time.time()
result1 = function1(jnp.array(1.0)).block_until_ready()
time1 = time.time() - start_time
start_time = time.time()
result2 = function2(jnp.array(1.0)).block_until_ready()
time2 = time.time() - start_time
print(f"function1 execution time: {time1:.6f}s")
print(f"function2 execution time: {time2:.6f}s")
# 8. JIT limitations
print("\n8. JIT Limitations...")
print("JIT compilation has some limitations:")
print("- Python control flow (if/else, for, while) must be replaced with JAX primitives")
print("- Dynamic shapes can cause recompilation")
print("- Some Python features are not supported")
print("- Debugging can be more challenging")
# Example of JIT limitation
def problematic_function(x):
"""Function with unsupported Python features"""
# This would cause issues with JIT
# result = []
# for i in range(len(x)):
# if x[i] > 0:
# result.append(jnp.sin(x[i]))
# else:
# result.append(jnp.cos(x[i]))
# return jnp.array(result)
# Instead, use JAX primitives
return jax.lax.cond(x > 0, jnp.sin, jnp.cos, x)
# JIT compile the corrected version
jitted_problematic = jax.jit(jax.vmap(problematic_function))
x_test = jnp.array([-1.0, 0.5, 1.0, -0.5])
result = jitted_problematic(x_test)
print(f"Result with JAX primitives: {result}")
Neural Network Training Example
# Neural network training example with JAX
import jax
import jax.numpy as jnp
import optax # Optimization library for JAX
import matplotlib.pyplot as plt
from functools import partial
print("\nNeural Network Training Example...")
# 1. Define a simple neural network
print("1. Defining a Simple Neural Network...")
def neural_network(params, x):
"""A simple feedforward neural network"""
w1, b1, w2, b2, w3, b3 = params
# First hidden layer
h1 = jnp.tanh(jnp.dot(x, w1) + b1)
# Second hidden layer
h2 = jnp.tanh(jnp.dot(h1, w2) + b2)
# Output layer
output = jnp.dot(h2, w3) + b3
return output
# 2. Initialize parameters
print("\n2. Initializing Parameters...")
def initialize_params(input_dim, hidden_dim1, hidden_dim2, output_dim, key):
"""Initialize neural network parameters"""
keys = jax.random.split(key, 6)
params = [
jax.random.normal(keys[0], (input_dim, hidden_dim1)) * jnp.sqrt(2.0 / input_dim), # w1
jnp.zeros(hidden_dim1), # b1
jax.random.normal(keys[1], (hidden_dim1, hidden_dim2)) * jnp.sqrt(2.0 / hidden_dim1), # w2
jnp.zeros(hidden_dim2), # b2
jax.random.normal(keys[2], (hidden_dim2, output_dim)) * jnp.sqrt(2.0 / hidden_dim2), # w3
jnp.zeros(output_dim) # b3
]
return params
# Network dimensions
input_dim = 1
hidden_dim1 = 32
hidden_dim2 = 32
output_dim = 1
# Initialize parameters
key = jax.random.PRNGKey(42)
params = initialize_params(input_dim, hidden_dim1, hidden_dim2, output_dim, key)
print(f"Parameter shapes:")
for i, p in enumerate(params):
print(f"Layer {i+1}: {p.shape}")
# 3. Define loss function
print("\n3. Defining Loss Function...")
def mse_loss(params, x, y_true):
"""Mean squared error loss"""
y_pred = neural_network(params, x)
return jnp.mean((y_pred - y_true)**2)
# 4. Create synthetic data
print("\n4. Creating Synthetic Data...")
def create_data(n_samples, key):
"""Create synthetic regression data"""
keys = jax.random.split(key, 2)
x = jax.random.uniform(keys[0], (n_samples, input_dim), minval=-5, maxval=5)
y = jnp.sin(x) * jnp.cos(x/2) + 0.1 * jax.random.normal(keys[1], (n_samples, output_dim))
return x, y
n_samples = 1000
x_train, y_train = create_data(n_samples, key)
x_test, y_test = create_data(200, jax.random.PRNGKey(43))
print(f"Training data shape: x={x_train.shape}, y={y_train.shape}")
print(f"Test data shape: x={x_test.shape}, y={y_test.shape}")
# Plot the data
plt.figure(figsize=(10, 6))
plt.scatter(x_train, y_train, alpha=0.5, label='Training data')
plt.scatter(x_test, y_test, alpha=0.5, label='Test data')
plt.title('Synthetic Regression Data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
# 5. Set up optimization
print("\n5. Setting Up Optimization...")
# Initialize optimizer
optimizer = optax.adam(learning_rate=0.01)
opt_state = optimizer.init(params)
# Define update function
@jax.jit
def update(params, opt_state, x, y):
"""Single optimization step"""
# Compute loss and gradients
loss, grads = jax.value_and_grad(mse_loss)(params, x, y)
# Update parameters
updates, opt_state = optimizer.update(grads, opt_state)
params = optax.apply_updates(params, updates)
return params, opt_state, loss
# 6. Training loop
print("\n6. Training Loop...")
n_epochs = 1000
batch_size = 32
loss_history = []
test_loss_history = []
# JIT compile the prediction function
predict = jax.jit(neural_network)
for epoch in range(n_epochs):
# Shuffle data
key, subkey = jax.random.split(key)
perm = jax.random.permutation(subkey, len(x_train))
x_shuffled = x_train[perm]
y_shuffled = y_train[perm]
# Mini-batch training
epoch_loss = 0.0
for i in range(0, len(x_train), batch_size):
x_batch = x_shuffled[i:i+batch_size]
y_batch = y_shuffled[i:i+batch_size]
params, opt_state, loss = update(params, opt_state, x_batch, y_batch)
epoch_loss += loss * len(x_batch)
epoch_loss /= len(x_train)
loss_history.append(epoch_loss)
# Compute test loss
test_loss = mse_loss(params, x_test, y_test)
test_loss_history.append(test_loss)
if epoch % 100 == 0:
print(f"Epoch {epoch}: Train Loss = {epoch_loss:.6f}, Test Loss = {test_loss:.6f}")
print(f"Final train loss: {loss_history[-1]:.6f}")
print(f"Final test loss: {test_loss_history[-1]:.6f}")
# 7. Plot training progress
print("\n7. Plotting Training Progress...")
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(loss_history, label='Training Loss')
plt.plot(test_loss_history, label='Test Loss')
plt.title('Training and Test Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
# Plot predictions
plt.subplot(1, 2, 2)
x_plot = jnp.linspace(-5, 5, 200).reshape(-1, 1)
y_pred = predict(params, x_plot)
plt.scatter(x_train, y_train, alpha=0.3, label='Training data')
plt.scatter(x_test, y_test, alpha=0.3, label='Test data')
plt.plot(x_plot, y_pred, 'r-', label='Model prediction')
plt.title('Model Predictions')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()
# 8. Model evaluation
print("\n8. Model Evaluation...")
final_train_loss = mse_loss(params, x_train, y_train)
final_test_loss = mse_loss(params, x_test, y_test)
print(f"Final training loss: {final_train_loss:.6f}")
print(f"Final test loss: {final_test_loss:.6f}")
# 9. Gradient analysis
print("\n9. Gradient Analysis...")
# Compute gradients on a batch
x_batch = x_train[:32]
y_batch = y_train[:32]
_, grads = jax.value_and_grad(mse_loss)(params, x_batch, y_batch)
print("Gradient magnitudes:")
for i, grad in enumerate(grads):
print(f"Layer {i+1}: {jnp.linalg.norm(grad):.6f}")
# 10. Model saving and loading
print("\n10. Model Saving and Loading...")
# Save model parameters
saved_params = jax.tree_util.tree_map(lambda x: np.array(x), params)
# In a real scenario, you would save to disk:
# import pickle
# with open('model_params.pkl', 'wb') as f:
# pickle.dump(saved_params, f)
print("Model parameters saved (in memory)")
# Load model parameters
loaded_params = jax.tree_util.tree_map(jnp.array, saved_params)
# Verify loaded parameters
loaded_train_loss = mse_loss(loaded_params, x_train, y_train)
print(f"Loaded model training loss: {loaded_train_loss:.6f}")
print(f"Original model training loss: {final_train_loss:.6f}")
print(f"Parameters match: {jnp.allclose(final_train_loss, loaded_train_loss)}")
Performance Optimization
JAX Performance Techniques
| Technique | Description | Use Case |
|---|---|---|
| JIT Compilation | Just-in-time compilation with XLA | Performance-critical functions |
| Vectorization | Automatic vectorization with vmap | Batch processing |
| Parallelization | Multi-device parallelization with pmap | Distributed computing |
| GPU/TPU Acceleration | Hardware acceleration | Large-scale computations |
| Memory Optimization | Efficient memory usage | Large arrays and models |
| Fusion | Operation fusion | Reducing memory bandwidth |
| Caching | Compilation caching | Repeated function calls |
| Static Arguments | Static argument optimization | Functions with fixed parameters |
| In-Place Updates | Efficient array updates | Mutable state in loops |
| Custom Kernels | Custom GPU kernels | Specialized operations |
GPU Acceleration Example
# GPU acceleration example with JAX
import jax
import jax.numpy as jnp
import time
import matplotlib.pyplot as plt
print("\nGPU Acceleration Example...")
# Check available devices
print("Available devices:")
devices = jax.devices()
for i, device in enumerate(devices):
print(f"Device {i}: {device}")
# Check if GPU is available
gpu_available = any(d.platform == 'gpu' for d in devices)
print(f"\nGPU available: {gpu_available}")
if gpu_available:
print("GPU acceleration is available")
# Use the first GPU device
gpu_device = [d for d in devices if d.platform == 'gpu'][0]
print(f"Using GPU: {gpu_device}")
else:
print("GPU not available, using CPU")
# 1. Basic GPU operations
print("\n1. Basic GPU Operations...")
def basic_operations(x):
"""Basic array operations"""
return jnp.sin(x) + jnp.cos(x) * jnp.exp(-x**2)
# Create arrays of different sizes
sizes = [100, 1000, 10000, 100000, 1000000]
gpu_times = []
cpu_times = []
for size in sizes:
x = jnp.linspace(0, 10, size)
# Time GPU computation
if gpu_available:
x_gpu = jax.device_put(x, gpu_device)
start_time = time.time()
result_gpu = basic_operations(x_gpu).block_until_ready()
gpu_time = time.time() - start_time
gpu_times.append(gpu_time)
else:
gpu_time = float('nan')
gpu_times.append(gpu_time)
# Time CPU computation
x_cpu = jax.device_put(x, jax.devices('cpu')[0])
start_time = time.time()
result_cpu = basic_operations(x_cpu).block_until_ready()
cpu_time = time.time() - start_time
cpu_times.append(cpu_time)
print(f"Size {size}: GPU={gpu_time:.6f}s, CPU={cpu_time:.6f}s")
# Plot performance comparison
plt.figure(figsize=(10, 6))
plt.plot(sizes, gpu_times, label='GPU', marker='o')
plt.plot(sizes, cpu_times, label='CPU', marker='o')
plt.title('GPU vs CPU Performance')
plt.xlabel('Array Size')
plt.ylabel('Time (seconds)')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.show()
# 2. Matrix operations
print("\n2. Matrix Operations...")
def matrix_operations(n):
"""Matrix operations benchmark"""
key = jax.random.PRNGKey(42)
a = jax.random.normal(key, (n, n))
b = jax.random.normal(key, (n, n))
# Matrix multiplication
c = jnp.dot(a, b)
# Matrix inversion
try:
inv_a = jnp.linalg.inv(a)
except:
inv_a = jnp.zeros_like(a)
# Singular value decomposition
try:
u, s, vh = jnp.linalg.svd(a)
except:
u, s, vh = jnp.zeros_like(a), jnp.zeros(n), jnp.zeros_like(a)
return c, inv_a, (u, s, vh)
# Test different matrix sizes
matrix_sizes = [100, 200, 500, 1000]
gpu_matrix_times = []
cpu_matrix_times = []
for size in matrix_sizes:
print(f"Testing matrix size {size}x{size}...")
# Time GPU computation
if gpu_available:
start_time = time.time()
result_gpu = matrix_operations(size)
gpu_matrix_time = time.time() - start_time
gpu_matrix_times.append(gpu_matrix_time)
else:
gpu_matrix_time = float('nan')
gpu_matrix_times.append(gpu_matrix_time)
# Time CPU computation
with jax.default_device(jax.devices('cpu')[0]):
start_time = time.time()
result_cpu = matrix_operations(size)
cpu_matrix_time = time.time() - start_time
cpu_matrix_times.append(cpu_matrix_time)
print(f" GPU: {gpu_matrix_time:.6f}s, CPU: {cpu_matrix_time:.6f}s")
# Plot matrix operation performance
plt.figure(figsize=(10, 6))
plt.plot(matrix_sizes, gpu_matrix_times, label='GPU', marker='o')
plt.plot(matrix_sizes, cpu_matrix_times, label='CPU', marker='o')
plt.title('Matrix Operations Performance')
plt.xlabel('Matrix Size')
plt.ylabel('Time (seconds)')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.show()
# 3. Neural network training
print("\n3. Neural Network Training...")
def neural_network_training(n_samples, n_features, n_epochs, batch_size):
"""Neural network training benchmark"""
key = jax.random.PRNGKey(42)
# Generate synthetic data
keys = jax.random.split(key, 2)
x = jax.random.normal(keys[0], (n_samples, n_features))
y = jnp.sin(jnp.sum(x, axis=1, keepdims=True)) + 0.1 * jax.random.normal(keys[1], (n_samples, 1))
# Initialize parameters
input_dim = n_features
hidden_dim = 64
output_dim = 1
keys = jax.random.split(key, 4)
params = [
jax.random.normal(keys[0], (input_dim, hidden_dim)) * jnp.sqrt(2.0 / input_dim),
jnp.zeros(hidden_dim),
jax.random.normal(keys[1], (hidden_dim, hidden_dim)) * jnp.sqrt(2.0 / hidden_dim),
jnp.zeros(hidden_dim),
jax.random.normal(keys[2], (hidden_dim, output_dim)) * jnp.sqrt(2.0 / hidden_dim),
jnp.zeros(output_dim)
]
# Define network and loss
def network(params, x):
w1, b1, w2, b2, w3, b3 = params
h1 = jnp.tanh(jnp.dot(x, w1) + b1)
h2 = jnp.tanh(jnp.dot(h1, w2) + b2)
return jnp.dot(h2, w3) + b3
def loss(params, x, y):
y_pred = network(params, x)
return jnp.mean((y_pred - y)**2)
# Initialize optimizer
optimizer = optax.adam(learning_rate=0.01)
opt_state = optimizer.init(params)
# Define update function
@jax.jit
def update(params, opt_state, x, y):
loss_val, grads = jax.value_and_grad(loss)(params, x, y)
updates, opt_state = optimizer.update(grads, opt_state)
params = optax.apply_updates(params, updates)
return params, opt_state, loss_val
# Training loop
for epoch in range(n_epochs):
# Shuffle data
key, subkey = jax.random.split(key)
perm = jax.random.permutation(subkey, len(x))
x_shuffled = x[perm]
y_shuffled = y[perm]
# Mini-batch training
epoch_loss = 0.0
for i in range(0, len(x), batch_size):
x_batch = x_shuffled[i:i+batch_size]
y_batch = y_shuffled[i:i+batch_size]
params, opt_state, loss_val = update(params, opt_state, x_batch, y_batch)
epoch_loss += loss_val * len(x_batch)
epoch_loss /= len(x)
return epoch_loss
# Test different configurations
configurations = [
(1000, 10, 10, 32),
(10000, 20, 20, 64),
(100000, 50, 50, 128)
]
gpu_nn_times = []
cpu_nn_times = []
for n_samples, n_features, n_epochs, batch_size in configurations:
print(f"\nTesting NN training: {n_samples} samples, {n_features} features, {n_epochs} epochs")
# Time GPU computation
if gpu_available:
start_time = time.time()
gpu_loss = neural_network_training(n_samples, n_features, n_epochs, batch_size)
gpu_nn_time = time.time() - start_time
gpu_nn_times.append(gpu_nn_time)
else:
gpu_nn_time = float('nan')
gpu_nn_times.append(gpu_nn_time)
# Time CPU computation
with jax.default_device(jax.devices('cpu')[0]):
start_time = time.time()
cpu_loss = neural_network_training(n_samples, n_features, n_epochs, batch_size)
cpu_nn_time = time.time() - start_time
cpu_nn_times.append(cpu_nn_time)
print(f" GPU: {gpu_nn_time:.2f}s, CPU: {cpu_nn_time:.2f}s")
print(f" Final loss - GPU: {gpu_loss:.6f}, CPU: {cpu_loss:.6f}")
# Plot neural network training performance
plt.figure(figsize=(10, 6))
config_labels = [f"{n}s,{f}f,{e}e" for n, f, e, _ in configurations]
plt.plot(config_labels, gpu_nn_times, label='GPU', marker='o')
plt.plot(config_labels, cpu_nn_times, label='CPU', marker='o')
plt.title('Neural Network Training Performance')
plt.xlabel('Configuration (samples,features,epochs)')
plt.ylabel('Time (seconds)')
plt.legend()
plt.grid(True)
plt.show()
# 4. Memory usage comparison
print("\n4. Memory Usage Comparison...")
if gpu_available:
import numpy as np
def get_memory_usage():
"""Get current memory usage in MB"""
# This is a simplified approach
# In practice, you would use platform-specific tools
return 0.0
# Test memory usage for large arrays
array_sizes = [1000, 10000, 100000, 1000000]
gpu_memory = []
cpu_memory = []
for size in array_sizes:
print(f"Testing memory usage for array size {size}...")
# GPU memory
if gpu_available:
x_gpu = jax.device_put(jnp.ones(size), gpu_device)
# In a real scenario, measure actual GPU memory usage
gpu_mem = size * 8 / (1024 * 1024) # Approximate memory in MB
gpu_memory.append(gpu_mem)
else:
gpu_memory.append(float('nan'))
# CPU memory
x_cpu = jax.device_put(jnp.ones(size), jax.devices('cpu')[0])
# In a real scenario, measure actual CPU memory usage
cpu_mem = size * 8 / (1024 * 1024) # Approximate memory in MB
cpu_memory.append(cpu_mem)
print(f" GPU: {gpu_mem:.2f} MB, CPU: {cpu_mem:.2f} MB")
# Plot memory usage
plt.figure(figsize=(10, 6))
plt.plot(array_sizes, gpu_memory, label='GPU', marker='o')
plt.plot(array_sizes, cpu_memory, label='CPU', marker='o')
plt.title('Memory Usage Comparison')
plt.xlabel('Array Size')
plt.ylabel('Memory (MB)')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.show()
else:
print("GPU memory usage comparison not available")
Challenges
Conceptual Challenges
- Functional Programming: Pure functions and immutability
- Automatic Differentiation: Understanding gradient computation
- JIT Compilation: Working with compiled functions
- Device Management: GPU/TPU memory management
- Parallel Computing: Multi-device synchronization
- Numerical Stability: Precision and numerical issues
- Debugging: Debugging compiled code
- Performance Optimization: Maximizing hardware utilization
Practical Challenges
- Learning Curve: Transitioning from imperative to functional style
- Hardware Requirements: GPU/TPU requirements
- Memory Management: Large array memory usage
- Integration: Combining with existing codebases
- Debugging: Limited debugging tools for compiled code
- Error Messages: Sometimes cryptic error messages
- Version Compatibility: Keeping up with rapid development
- Production Deployment: Deploying JAX models in production
Technical Challenges
- Control Flow: Limited support for Python control flow
- Dynamic Shapes: Handling variable-sized inputs
- Memory Bandwidth: GPU memory bandwidth limitations
- Numerical Precision: Floating-point precision issues
- Custom Operations: Implementing custom GPU kernels
- Distributed Computing: Scaling across multiple devices
- Mixed Precision: Using reduced precision for performance
- Determinism: Ensuring reproducible results
Research and Advancements
Key Developments
- "JAX: Composable Transformations of Python+NumPy Programs" (Frostig et al., 2018)
- Introduced JAX framework
- Presented automatic differentiation and JIT compilation
- Demonstrated composable function transformations
- "Compiling Machine Learning Programs via High-Level Tracing" (2019)
- Presented XLA compilation for ML
- Demonstrated performance optimizations
- Showed integration with deep learning frameworks
- "Automatic Differentiation in Machine Learning: A Survey" (Baydin et al., 2018)
- Surveyed automatic differentiation techniques
- Demonstrated applications in ML
- Showed JAX's approach to AD
- "JAX: Accelerated Machine Learning Research" (2020)
- Presented JAX for ML research
- Demonstrated performance benchmarks
- Showed applications in deep learning
- "Scalable Machine Learning with JAX" (2021)
- Presented distributed computing with JAX
- Demonstrated multi-device training
- Showed large-scale ML applications
Emerging Research Directions
- Automated Machine Learning: AutoML with JAX
- Neurosymbolic AI: Combining neural networks and symbolic reasoning
- Quantum Machine Learning: Quantum computing integration
- Green AI: Energy-efficient computing
- Edge Computing: JAX on edge devices
- Multimodal Learning: Processing multiple data modalities
- Explainable AI: Interpretability in JAX models
- Federated Learning: Privacy-preserving distributed learning
- Neuromorphic Computing: Brain-inspired computing
- Automated Optimization: Auto-tuning of JAX programs
Best Practices
Development
- Start Small: Begin with simple functions and scale up
- Use JIT: Compile performance-critical functions
- Vectorize: Use vmap for batch operations
- Manage Randomness: Use explicit random keys
- Handle Control Flow: Use JAX primitives for control flow
Performance
- Profile First: Identify bottlenecks before optimization
- Use GPU/TPU: Leverage hardware acceleration
- Optimize Memory: Minimize memory usage
- Batch Operations: Process data in batches
- Parallelize: Use pmap for multi-device computing
Debugging
- Test Incrementally: Test functions before JIT compilation
- Use Smaller Inputs: Debug with smaller data
- Check Shapes: Verify array shapes and dimensions
- Monitor Memory: Watch for memory leaks
- Use Debugging Tools: Leverage JAX debugging utilities
Deployment
- Save Models: Save trained model parameters
- Optimize Inference: JIT compile inference functions
- Handle Dependencies: Manage JAX and hardware dependencies
- Monitor Performance: Track performance in production
- Plan for Updates: Account for JAX version updates
External Resources
- JAX Official Website
- JAX GitHub Repository
- JAX Documentation
- JAX Tutorials
- JAX API Reference
- JAX GitHub Discussions
- JAX Issue Tracker
- JAX Release Notes
- JAX Examples
- JAX Cheat Sheet
- JAX on TPUs
- JAX for Deep Learning
- JAX Autodiff Cookbook
- JAX Performance Guide
- Optax (Optimization for JAX)
- Flax (Neural Networks for JAX)
- Haiku (Deep Learning for JAX)
- JAX on GitHub
- JAX Community
- JAX Research Papers
- JAX Benchmarks