Skip to content

The Control Console API

March 27, 2026 — Phase 5 Build

The Question

The system is running — feeds are fetched, transformed, and published to CDN. But the only way to see what's happening is to read Cloud Run logs. Can we add a REST API layer that exposes system status, connected workers, registered assets, and pipeline run history through simple HTTP endpoints?

What We Tried

  • Built a Fastify web service that proxies REST requests to the orchestrator's existing gRPC query handlers
  • Used proto-loader to create a gRPC client from the same .proto files the orchestrator serves
  • Deployed to Cloud Run with auto-refreshing IAM ID tokens for service-to-service authentication
  • Separated the startup health probe from the orchestrator connectivity check

What We Found

  1. The gRPC-to-REST translation was trivial — 70 lines of route code. The orchestrator already had all five query handlers (GetStatus, GetWorkers, GetAssets, GetRuns, GetRunSteps) from Phase 2. The web service is just a thin HTTP wrapper that calls them. Building the query handlers into the orchestrator first turned out to be the right call.

  2. Service-to-service auth on Cloud Run is the same pattern everywhere. Both the pipeline worker and the web service needed ID tokens from the metadata server to call the orchestrator. The web service uses the same auto-refreshing pattern (fetch token per-RPC) that we built for the worker. Cloud Run requires TLS + IAM auth even between internal services within the same VPC.

  3. Health probes and connectivity checks are different things. The first deploy failed because the health check tried to call the orchestrator — which takes a moment to become reachable after the web service starts. Splitting into /health (always 200 if the process is up) and /health/deep (checks orchestrator gRPC) was the fix. Cloud Run's startup probe needs the lightweight one.

  4. 6,306 pipeline runs visible through the API on first deploy. The system has been processing feeds continuously since Phase 4. Every 20 seconds, a new run completes — and now they're all queryable through curl.

  5. Proto enum strings leak through to API responses. The orchestrator stores "completed" in the database but the gRPC query response wraps it as a RunStatus enum. Proto-loader doesn't match "completed" to any enum value, so the API shows RUN_STATUS_UNSPECIFIED. A reverse mapping is needed but doesn't affect the underlying data.

What It Looks Like

The production API, responding to real data:

$ curl -s https://continuous-gtfs-web-235382118100.us-west1.run.app/health/deep | jq .
{
  "status": "ok",
  "service": "continuous-gtfs-web",
  "orchestrator": {
    "connected": true,
    "workers": 1,
    "assets": 8,
    "runs": 6306
  }
}

$ curl -s .../api/v1/workers | jq '.workers[0]'
{
  "capabilities": ["schedule", "realtime"],
  "worker_id": "worker-1bdcbe65",
  "version": "dev",
  "connected_since": "2026-03-27T03:25:07.385Z"
}

The Decision

All five MVP phases are complete — the system is live, processing real feeds, and observable through a REST API. The web service adds the final layer: HTTP access to system status, worker health, asset inventory, and run history.

What This Means

  • Anyone can check system health with curl — no GCP console access needed
  • The API returns real production data: 8 assets, 1 worker, 6,000+ completed runs
  • 12 tracked issues remain for future work (CI/CD automation, schedule pipeline, HTTPS, package publishing)
  • The agency repo pattern is validated — Sound Transit's transforms live in a separate repo with their own Docker image
  • From experiments to production in one week: 10 spec files, 5 build phases, 40+ commits, ~30 code reviews

Open Questions

  • When do we wire up the schedule pipeline worker? It's the biggest remaining P2 issue.
  • Should we add the WebSocket endpoint for live run progress now, or defer until there's a UI consumer?
  • The run status enum display issue needs fixing before any external consumers use the API.