The creator of Keras on what LLMs actually are — and what they aren’t.


Key Insights

LLMs as vector functions: LLMs aren’t discrete programs with conditional logic. They’re vector functions — continuous input-to-output mappings implemented via curves.

They’re not like discrete programs like you might imagine a Python program. They’re actually vector functions.

Compression forces learning: With infinite memory, an LLM could just be a lookup table. But limited parameters force compression — so it learns predictive functions instead.

Style transfer as efficiency: It’s more compressive to learn style independently from content. That’s why LLMs can do textual style transfer — they learn millions of independent predictive functions and combine them via interpolation.

Compositionality: Because these are vector functions, you can sum them, interpolate between them, and produce new functions. This is fundamentally different from discrete programs.


Listen to the episode