Week 8: Training Foundation Models
The practical reality of training billion-parameter models
Time Estimate: 3-4 hours
Topics Covered
- Data curation and preprocessing at scale
- Tokenization strategies (BPE, WordPiece, SentencePiece)
- Learning rate schedules and warmup strategies
- The Pile dataset and training data quality
- Compute-optimal training strategies
Featured Speaker
SA
Sam Altman
CEO, OpenAI
Learn from industry leaders who are building the future of AI infrastructure and applications.
Video Resources
📹 Video content will be added here by Agent 2
Videos include keynotes, technical talks, and tutorials from industry leaders.
Reading Materials
📚 Reading list will be added here by Agent 3
Research papers, blog posts, and technical documentation.
🛠️ Hands-On Lab
Reproduce GPT-2 (Small)
Advanced 4 hoursObjective
Follow Karpathy's GPT-2 reproduction, implement checkpointing, and monitor training with Weights & Biases.
Prerequisites
- Strong PyTorch skills
- Transformer architecture mastery
- W&B account
- Google Colab Pro or cloud GPU instance
Setup Instructions
- Watch Karpathy's GPT-2 video (2x speed):
https://youtu.be/l8pRSuU81PU - Clone repo:
git clone https://github.com/karpathy/nanoGPT - Prepare FineWeb-Edu dataset (10B tokens)
- Set up W&B logging and checkpointing
Tasks
- Reproduce GPT-2 (124M) training run
- Implement gradient checkpointing for memory efficiency
- Add model checkpointing every 1000 steps
- Monitor loss curves, learning rate schedule in W&B
- Resume from checkpoint and continue training
- Generate samples and evaluate perplexity