Week 8: Training Foundation Models

The practical reality of training billion-parameter models

Time Estimate: 3-4 hours

Topics Covered

Data curation and preprocessing at scale
Tokenization strategies (BPE, WordPiece, SentencePiece)
Learning rate schedules and warmup strategies
The Pile dataset and training data quality
Compute-optimal training strategies

Featured Speaker

SA

Sam Altman

CEO, OpenAI

Learn from industry leaders who are building the future of AI infrastructure and applications.

Video Resources

📹 Video content will be added here by Agent 2

Videos include keynotes, technical talks, and tutorials from industry leaders.

Reading Materials

📚 Reading list will be added here by Agent 3

Research papers, blog posts, and technical documentation.

🛠️ Hands-On Lab

Reproduce GPT-2 (Small)

Advanced 4 hours

Objective

Follow Karpathy's GPT-2 reproduction, implement checkpointing, and monitor training with Weights & Biases.

Prerequisites

Strong PyTorch skills
Transformer architecture mastery
W&B account
Google Colab Pro or cloud GPU instance

Setup Instructions

Watch Karpathy's GPT-2 video (2x speed): https://youtu.be/l8pRSuU81PU
Clone repo: git clone https://github.com/karpathy/nanoGPT
Prepare FineWeb-Edu dataset (10B tokens)
Set up W&B logging and checkpointing

Tasks

Reproduce GPT-2 (124M) training run
Implement gradient checkpointing for memory efficiency
Add model checkpointing every 1000 steps
Monitor loss curves, learning rate schedule in W&B
Resume from checkpoint and continue training
Generate samples and evaluate perplexity

Resources