Week 7: Foundation Models – Architecture & Scaling

How GPT, Claude, Gemini, and other LLMs are built

Time Estimate: 3-4 hours

Topics Covered

Featured Speaker

SA

Sam Altman

CEO, OpenAI

Learn from industry leaders who are building the future of AI infrastructure and applications.

Video Resources

📹 Video content will be added here by Agent 2

Videos include keynotes, technical talks, and tutorials from industry leaders.

Reading Materials

📚 Reading list will be added here by Agent 3

Research papers, blog posts, and technical documentation.

🛠️ Hands-On Lab

Scaling Experiments with nanoGPT

Intermediate 4 hours

Objective

Train models at different scales, measure scaling laws empirically, and experiment with distributed data parallelism.

Prerequisites

  • PyTorch experience
  • Understanding of transformer architecture
  • Weights & Biases account (free)
  • Google Colab Pro recommended (multi-GPU)

Setup Instructions

  1. Clone nanoGPT: git clone https://github.com/karpathy/nanoGPT
  2. Install dependencies: pip install torch numpy wandb
  3. Set up W&B: wandb login
  4. Prepare OpenWebText dataset (or use Shakespeare for faster iteration)

Tasks

  1. Train nanoGPT at 3 scales: 10M, 50M, 124M parameters
  2. Plot loss vs compute (FLOPs)
  3. Experiment with PyTorch DDP on 2 GPUs
  4. Compare training efficiency (tokens/sec)
  5. Analyze scaling behavior and extrapolate trends

Resources