Week 9: Fine-tuning and Alignment

RLHF, instruction tuning, and making models useful

Time Estimate: 3-4 hours

Topics Covered

Featured Speaker

SA

Sam Altman

CEO, OpenAI

Learn from industry leaders who are building the future of AI infrastructure and applications.

Video Resources

📹 Video content will be added here by Agent 2

Videos include keynotes, technical talks, and tutorials from industry leaders.

Reading Materials

📚 Reading list will be added here by Agent 3

Research papers, blog posts, and technical documentation.

🛠️ Hands-On Lab

Fine-tune with RLHF & DPO

Intermediate 3 hours

Objective

Fine-tune a small language model using supervised fine-tuning and Direct Preference Optimization.

Prerequisites

  • PyTorch and Hugging Face Transformers
  • Understanding of RLHF concepts
  • Google Colab with GPU
  • Hugging Face account

Setup Instructions

  1. Install TRL library: pip install trl transformers datasets
  2. Log into Hugging Face: huggingface-cli login
  3. Clone starter repo: git clone https://github.com/stanford-cs153/rlhf-lab

Tasks

  1. Supervised fine-tune GPT-2 on instruction dataset
  2. Prepare preference dataset (chosen/rejected pairs)
  3. Implement DPO training loop with TRL
  4. Compare aligned vs unaligned model outputs
  5. Measure helpfulness improvement qualitatively

Resources