Week 7: Foundation Models – Architecture & Scaling

How GPT, Claude, Gemini, and other LLMs are built

Time Estimate: 3-4 hours

Topics Covered

GPT vs BERT vs T5 architectures
Scaling laws and emergent abilities
Relationship between compute, data, and model size
Chinchilla scaling laws
The path to GPT-4 and beyond

Featured Speaker

SA

Sam Altman

CEO, OpenAI

Learn from industry leaders who are building the future of AI infrastructure and applications.

Video Resources

📹 Video content will be added here by Agent 2

Videos include keynotes, technical talks, and tutorials from industry leaders.

Reading Materials

📚 Reading list will be added here by Agent 3

Research papers, blog posts, and technical documentation.

🛠️ Hands-On Lab

Scaling Experiments with nanoGPT

Intermediate 4 hours

Objective

Train models at different scales, measure scaling laws empirically, and experiment with distributed data parallelism.

Prerequisites

PyTorch experience
Understanding of transformer architecture
Weights & Biases account (free)
Google Colab Pro recommended (multi-GPU)

Setup Instructions

Clone nanoGPT: git clone https://github.com/karpathy/nanoGPT
Install dependencies: pip install torch numpy wandb
Set up W&B: wandb login
Prepare OpenWebText dataset (or use Shakespeare for faster iteration)

Tasks

Train nanoGPT at 3 scales: 10M, 50M, 124M parameters
Plot loss vs compute (FLOPs)
Experiment with PyTorch DDP on 2 GPUs
Compare training efficiency (tokens/sec)
Analyze scaling behavior and extrapolate trends

Resources