Experiments
Experiments are how we validate architectural decisions before committing to them. Each experiment has a spec (the hypothesis and test protocol), an implementation (isolated code and infrastructure), and results (evidence that informs the decision).
How It Works
Every experiment follows a spec-first workflow:
- Draft a spec at
specs/experiments/{id}.md— hypothesis, what we're proving, test protocol, success criteria - Get the spec reviewed before writing any code
- Implement in an isolated workspace at
experiments/{id}/— its own dependencies, tests, and (if needed) cloud infrastructure - Run tests and benchmarks as defined in the spec
- Write results to
experiments/{id}/results/README.md— what was tested, what was learned, key numbers, conclusion - Close — set
status: closedin the spec, post findings to related GitHub issues, write a journal post
The spec is the golden record. Implementation serves the spec, not the other way around. When iterating, update the spec first.
Experiment Structure
specs/experiments/{id}.md # the spec (hypothesis, protocol, criteria)
experiments/{id}/
README.md # quickstart and implementation notes
pyproject.toml # isolated dependencies
bin/ # test, benchmark, deploy, teardown scripts
src/ # implementation code
tests/ # validation tests
results/ # benchmark data and results README
tf/ # OpenTofu config (if infra needed)
Dependency DAG
Experiments declare dependencies on other experiments via depends: in their spec frontmatter. A dependent experiment can build on the code, infrastructure, or findings of its dependencies.
Continuous Refinement
The set of open experiment specs should be continuously refined as experiments progress. Closing one experiment often reveals new questions, invalidates assumptions in planned experiments, or shifts priorities. Before starting a new batch, review all open specs and update or retire any that no longer reflect the current understanding.
Writing New Experiments
Experiments are authored with the /experimenter skill in Claude Code:
/experimenter create an experiment for testing CDN latency
/experimenter implement the asset-registry experiment
The skill handles the spec format, implementation structure, closing checklist, and issue comment templates.