Week 6: The Transformer Revolution
Understanding the architecture that powers GPT, BERT, and modern LLMs
Time Estimate: 3-4 hours
Topics Covered
- Self-attention mechanisms explained
- Positional encoding and multi-head attention
- Encoder vs decoder architectures
- Building GPT from scratch
- Visualizing attention patterns
Featured Speaker
AK
Andrej Karpathy
Co-founder, OpenAI
Learn from industry leaders who are building the future of AI infrastructure and applications.
Video Resources
📹 Video content will be added here by Agent 2
Videos include keynotes, technical talks, and tutorials from industry leaders.
Reading Materials
📚 Reading list will be added here by Agent 3
Research papers, blog posts, and technical documentation.
🛠️ Hands-On Lab
Build a Transformer from Scratch
Intermediate 4 hoursObjective
Implement multi-head attention and train a character-level language model, deepening understanding of transformer internals.
Prerequisites
- PyTorch fundamentals
- Understanding of attention mechanisms
- Python 3.8+
- Linear algebra basics
Setup Instructions
- Open Google Colab with GPU runtime
- Install dependencies:
pip install torch numpy bertviz - Download Shakespeare dataset:
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
Tasks
- Implement multi-head self-attention from scratch
- Build a complete transformer decoder block
- Train on Shakespeare text (character-level)
- Visualize attention patterns with BertViz
- Generate text samples and analyze quality