Week 7: Foundation Models – Architecture & Scaling
How GPT, Claude, Gemini, and other LLMs are built
Time Estimate: 3-4 hours
Topics Covered
- GPT vs BERT vs T5 architectures
- Scaling laws and emergent abilities
- Relationship between compute, data, and model size
- Chinchilla scaling laws
- The path to GPT-4 and beyond
Featured Speaker
SA
Sam Altman
CEO, OpenAI
Learn from industry leaders who are building the future of AI infrastructure and applications.
Video Resources
📹 Video content will be added here by Agent 2
Videos include keynotes, technical talks, and tutorials from industry leaders.
Reading Materials
📚 Reading list will be added here by Agent 3
Research papers, blog posts, and technical documentation.
🛠️ Hands-On Lab
Scaling Experiments with nanoGPT
Intermediate 4 hoursObjective
Train models at different scales, measure scaling laws empirically, and experiment with distributed data parallelism.
Prerequisites
- PyTorch experience
- Understanding of transformer architecture
- Weights & Biases account (free)
- Google Colab Pro recommended (multi-GPU)
Setup Instructions
- Clone nanoGPT:
git clone https://github.com/karpathy/nanoGPT - Install dependencies:
pip install torch numpy wandb - Set up W&B:
wandb login - Prepare OpenWebText dataset (or use Shakespeare for faster iteration)
Tasks
- Train nanoGPT at 3 scales: 10M, 50M, 124M parameters
- Plot loss vs compute (FLOPs)
- Experiment with PyTorch DDP on 2 GPUs
- Compare training efficiency (tokens/sec)
- Analyze scaling behavior and extrapolate trends