Interacting with ChatGPT is like having a conversation with a friend using chat bubbles.
Text is chopped into tokens, which are the small text chunks that the model processes.
The context window is the working memory of tokens - anything inside is directly accessible.
Pre-training is like compressing all of the internet into a single, lossy, probabilistic zip file.
The contents of the zip file are the parameters of a neural network, with a 1TB file equating to roughly one trillion parameters.
During pre-training, the model acts as an internet document generator.
The post-training stage is like attaching a smiley face to the zip file, giving the model the persona of an assistant.
A language model is a fully self-contained entity, like a 1TB file representing one trillion parameters, without built-in tools.
Karpathy represents thinking models as an emoji with an optional thinking bubble.
LLMs are like very junior data analysts that can plot figures but require supervision because they're a little absent-minded.
Custom AI podcasts are like having conversations about any arbitrary niche topic, which can be therapeutic for specialized interests.