Skip to content

Experiments

Experiments are how we validate architectural decisions before committing to them. Each experiment has a spec (the hypothesis and test protocol), an implementation (isolated code and infrastructure), and results (evidence that informs the decision).

How It Works

Every experiment follows a spec-first workflow:

  1. Draft a spec at specs/experiments/{id}.md — hypothesis, what we're proving, test protocol, success criteria
  2. Get the spec reviewed before writing any code
  3. Implement in an isolated workspace at experiments/{id}/ — its own dependencies, tests, and (if needed) cloud infrastructure
  4. Run tests and benchmarks as defined in the spec
  5. Write results to experiments/{id}/results/README.md — what was tested, what was learned, key numbers, conclusion
  6. Close — set status: closed in the spec, post findings to related GitHub issues, write a journal post

The spec is the golden record. Implementation serves the spec, not the other way around. When iterating, update the spec first.

Experiment Structure

specs/experiments/{id}.md          # the spec (hypothesis, protocol, criteria)
experiments/{id}/
  README.md                        # quickstart and implementation notes
  pyproject.toml                   # isolated dependencies
  bin/                             # test, benchmark, deploy, teardown scripts
  src/                             # implementation code
  tests/                           # validation tests
  results/                         # benchmark data and results README
  tf/                              # OpenTofu config (if infra needed)

Dependency DAG

Experiments declare dependencies on other experiments via depends: in their spec frontmatter. A dependent experiment can build on the code, infrastructure, or findings of its dependencies.

Continuous Refinement

The set of open experiment specs should be continuously refined as experiments progress. Closing one experiment often reveals new questions, invalidates assumptions in planned experiments, or shifts priorities. Before starting a new batch, review all open specs and update or retire any that no longer reflect the current understanding.

Writing New Experiments

Experiments are authored with the /experimenter skill in Claude Code:

/experimenter create an experiment for testing CDN latency
/experimenter implement the asset-registry experiment

The skill handles the spec format, implementation structure, closing checklist, and issue comment templates.