Andrej Karpathy's LLM Analogies

LLMs as Chatbots

Interacting with ChatGPT is like having a conversation with a friend using chat bubbles.

Hey there! How can I help you today?
Can you explain how AI works?
Tokens as Building Blocks

Text is chopped into tokens, which are the small text chunks that the model processes.

Hello
world
how
are
you
Context Window as Working Memory

The context window is the working memory of tokens - anything inside is directly accessible.

Token 1
Token 2
Token 3
...
Token N
Pre-training as Compression

Pre-training is like compressing all of the internet into a single, lossy, probabilistic zip file.

Parameters as Knowledge

The contents of the zip file are the parameters of a neural network, with a 1TB file equating to roughly one trillion parameters.

1,000,000,000,000 parameters
LLMs as Internet Document Generators

During pre-training, the model acts as an internet document generator.

Document 1
Document 2
Document 3
Post-training as Adding Personality

The post-training stage is like attaching a smiley face to the zip file, giving the model the persona of an assistant.

LLMs as Self-Contained Entities

A language model is a fully self-contained entity, like a 1TB file representing one trillion parameters, without built-in tools.

No calculator
No browser
Thinking Models
Hmm, let me think...

Karpathy represents thinking models as an emoji with an optional thinking bubble.

LLMs as Junior Data Analysts

LLMs are like very junior data analysts that can plot figures but require supervision because they're a little absent-minded.

Needs supervision
AI Podcast

Custom AI podcasts are like having conversations about any arbitrary niche topic, which can be therapeutic for specialized interests.

Chess
Gardening
Quantum Physics