28: Let It Run
Show Notes
Episode 0027: Let It Run
On March 7th, 2026, Andrej Karpathy released autoresearch — a 630-line Python repo that lets an AI agent run autonomous ML experiments overnight. The agent modifies training code, runs 5-minute experiments, keeps improvements, discards failures, and repeats. ~100 cycles while you sleep. Karpathy watched val_bpb drop from 1.0 to 0.97 without touching anything. Shopify CEO Tobi Lutke adapted it the same night and got a 19% improvement — with the smaller agent-optimized model eventually outperforming a larger manually configured one. The engineering task has shifted: you're not tuning the model anymore, you're writing the instructions that tell the agent how to tune the model.
The Repo
karpathy/autoresearch https://github.com/karpathy/autoresearch
Three files:
- prepare.py — fixed constants, one-time data prep, runtime utilities. Do not modify.
- train.py — the full GPT model, optimizer (Muon + AdamW), and training loop. The agent modifies this.
- program.md — your instructions to the agent. The only file the human iterates on.
How It Works
- Agent reads
program.md - Agent modifies
train.py(architecture, hyperparameters, optimizer — anything) - Fixed 5-minute training run (wall clock)
- Evaluate val_bpb (validation bits per byte — lower is better, architecture-agnostic)
- If improved: commit to git branch. If not: discard.
- Repeat. ~12 experiments/hour, ~100 overnight.
The 630-line constraint is intentional: the entire codebase fits in a model's context window, so the agent can reason about the whole file at once.
Key Results
- Karpathy's run: val_bpb 1.0 → 0.97 autonomously. Agent-discovered optimizations were integrated back into nanochat (his main production training codebase).
- Tobi Lutke: Adapted autoresearch for an internal Shopify model the night it dropped. 19% improvement in validation scores. The smaller agent-optimized model outperformed a larger manually-configured one.
The Concept
Karpathy describes program.md as the "research org code" — you're not programming the model, you're programming the organization that does the research. The human's job shifts from running experiments to writing the instructions that tell the agent how to run experiments well.
His README opens with fictional lore: the agents are now in the 10,205th generation of the codebase, "a self-modifying binary that has grown beyond human comprehension." He's joking. The repo exists and you can clone it tonight.
Links
- Repo: https://github.com/karpathy/autoresearch
- Karpathy's tweet: https://x.com/karpathy/status/2030371219518931079
- Tobi Lutke's tweet: https://x.com/tobi/status/2030771823151853938
- nanochat (parent repo): https://github.com/karpathy/nanochat