Episode 0027: Let It Run

On March 7th, 2026, Andrej Karpathy released autoresearch — a 630-line Python repo that lets an AI agent run autonomous ML experiments overnight. The agent modifies training code, runs 5-minute experiments, keeps improvements, discards failures, and repeats. ~100 cycles while you sleep. Karpathy watched val_bpb drop from 1.0 to 0.97 without touching anything. Shopify CEO Tobi Lutke adapted it the same night and got a 19% improvement — with the smaller agent-optimized model eventually outperforming a larger manually configured one. The engineering task has shifted: you're not tuning the model anymore, you're writing the instructions that tell the agent how to tune the model.

The Repo

karpathy/autoresearch https://github.com/karpathy/autoresearch

Three files: - prepare.py — fixed constants, one-time data prep, runtime utilities. Do not modify. - train.py — the full GPT model, optimizer (Muon + AdamW), and training loop. The agent modifies this. - program.md — your instructions to the agent. The only file the human iterates on.

How It Works

Agent reads program.md
Agent modifies train.py (architecture, hyperparameters, optimizer — anything)
Fixed 5-minute training run (wall clock)
Evaluate val_bpb (validation bits per byte — lower is better, architecture-agnostic)
If improved: commit to git branch. If not: discard.
Repeat. ~12 experiments/hour, ~100 overnight.

The 630-line constraint is intentional: the entire codebase fits in a model's context window, so the agent can reason about the whole file at once.

Key Results

Karpathy's run: val_bpb 1.0 → 0.97 autonomously. Agent-discovered optimizations were integrated back into nanochat (his main production training codebase).
Tobi Lutke: Adapted autoresearch for an internal Shopify model the night it dropped. 19% improvement in validation scores. The smaller agent-optimized model outperformed a larger manually-configured one.

The Concept

Karpathy describes program.md as the "research org code" — you're not programming the model, you're programming the organization that does the research. The human's job shifts from running experiments to writing the instructions that tell the agent how to run experiments well.

His README opens with fictional lore: the agents are now in the 10,205th generation of the codebase, "a self-modifying binary that has grown beyond human comprehension." He's joking. The repo exists and you can clone it tonight.

Listen

28: Let It Run

Show Notes

Episode 0027: Let It Run

The Repo

How It Works

Key Results

The Concept

Links