Week 8: Training Foundation Models

The practical reality of training billion-parameter models

Time Estimate: 3-4 hours

Topics Covered

Featured Speaker

SA

Sam Altman

CEO, OpenAI

Learn from industry leaders who are building the future of AI infrastructure and applications.

Video Resources

📹 Video content will be added here by Agent 2

Videos include keynotes, technical talks, and tutorials from industry leaders.

Reading Materials

📚 Reading list will be added here by Agent 3

Research papers, blog posts, and technical documentation.

🛠️ Hands-On Lab

Reproduce GPT-2 (Small)

Advanced 4 hours

Objective

Follow Karpathy's GPT-2 reproduction, implement checkpointing, and monitor training with Weights & Biases.

Prerequisites

  • Strong PyTorch skills
  • Transformer architecture mastery
  • W&B account
  • Google Colab Pro or cloud GPU instance

Setup Instructions

  1. Watch Karpathy's GPT-2 video (2x speed): https://youtu.be/l8pRSuU81PU
  2. Clone repo: git clone https://github.com/karpathy/nanoGPT
  3. Prepare FineWeb-Edu dataset (10B tokens)
  4. Set up W&B logging and checkpointing

Tasks

  1. Reproduce GPT-2 (124M) training run
  2. Implement gradient checkpointing for memory efficiency
  3. Add model checkpointing every 1000 steps
  4. Monitor loss curves, learning rate schedule in W&B
  5. Resume from checkpoint and continue training
  6. Generate samples and evaluate perplexity

Resources