How Do We Deploy Two Services That Talk to Each Other?

March 21, 2026 — Experiment: deployment-pattern

Updated

This two-service experiment evolved into a three-service architecture. The production infrastructure is defined in 012: What Does the Production Infrastructure Look Like? — 44 OpenTofu resources covering VPC, Cloud SQL, Cloud Run, CDN, and CI/CD.

The Question

Our architecture calls for two separate services: a pipeline service that runs GTFS transforms, and a control console that lets staff monitor and manage it. The pipeline service needs to be internal (no public access), and the console needs to call it over gRPC. How do we deploy both to Cloud Run, wire them together, and prove the framework we built in the pipeline-dag experiment actually works when containerized?

What We Tried

Two Docker containers built from the same codebase — one running the pipeline-dag Step framework with a gRPC interface, one running a FastAPI control console that calls the pipeline over gRPC
OpenTofu managing Artifact Registry, both Cloud Run services, IAM bindings, and service-to-service environment variables
Three-step deploy flow: create registry, build and push images, deploy services

What We Found

The Step framework works perfectly in a container — the pipeline service scans its pipeline folder on startup, resolves the DAG, and serves it over gRPC exactly as it does locally. Four steps loaded, DAG edges resolved, execution works. Zero changes to the framework code.
Cloud Run's "internal only" ingress doesn't work the way you'd expect — setting a service to internal-only means it can only receive traffic from within the VPC. But Cloud Run services send outbound traffic through the public internet by default. So our "internal" pipeline was unreachable from our own console service. The fix: use IAM to control who can call the service, and leave ingress open.

Resolved in host-orchestration

This was resolved by adding a VPC connector. With a VPC connector and ALL_TRAFFIC egress on the calling service, internal-only ingress works correctly. See 009.
Service-to-service gRPC on Cloud Run requires three things most tutorials skip — TLS channel credentials (Cloud Run terminates TLS), a Google ID token (Cloud Run enforces IAM), and the full .run.app hostname (not just the service name). Getting all three right took more iteration than the actual service code.
First cold-start gRPC call takes 5-15 seconds, but warm calls are fast — the pipeline service needs ~11 seconds to start, load the framework, scan the pipeline folder, and resolve the DAG. Once warm, the full roundtrip (host HTTP → gRPC to pipeline → response) is 160ms at p50, 192ms at p95 over 100 iterations. Fine for a management console, but the cold start rules out scale-to-zero for latency-sensitive paths.
Building on Apple Silicon for Cloud Run needs two flags — --platform linux/amd64 (obvious) and --provenance=false (not obvious — without it, Docker produces an OCI manifest index that Cloud Run can't resolve to an amd64 image).

What It Looks Like

The control console shows pipeline health, loaded steps, and run history:

Pipeline status: healthy
Steps loaded: 4
Last run: 2026-03-21T20:36:01Z

Pipeline Steps:
  • update_feed_info
  • merge_stops
  • remove_unused_stops
  • validate_stop_hierarchy

The DAG endpoint returns ReactFlow JSON ready for the eventual UI:

{
  "nodes": [
    {"id": "update_feed_info", "type": "builtin", "builtin": "UpdateFeedInfo"},
    {"id": "merge_stops", "type": "builtin", "builtin": "MergeStops"},
    {"id": "remove_unused_stops", "type": "custom"},
    {"id": "validate_stop_hierarchy", "type": "custom"}
  ],
  "edges": [
    {"source": "update_feed_info", "target": "merge_stops"},
    {"source": "merge_stops", "target": "remove_unused_stops"},
    {"source": "remove_unused_stops", "target": "validate_stop_hierarchy"}
  ]
}

Triggering a pipeline run executes all steps and returns per-step results:

{
  "run_id": "99d38dcd",
  "status": "completed",
  "step_results": [
    {"step_name": "update_feed_info", "status": "success"},
    {"step_name": "merge_stops", "status": "success"},
    {"step_name": "remove_unused_stops", "status": "success"},
    {"step_name": "validate_stop_hierarchy", "status": "success"}
  ]
}

The Decision

Deploy as two Cloud Run services with IAM-based access control and gRPC communication. The pipeline service runs the real Step framework, the control console connects via authenticated gRPC, and OpenTofu manages the entire deployment. VPC networking is not needed — IAM is simpler and sufficient.

Evolved to three-service architecture

This two-service model was the starting point. The host-orchestration experiment evolved it to three services (web + orchestrator + pipeline worker pools) with VPC networking, an asset registry, and worker pools that connect outbound to the orchestrator. See the architecture overview.

What This Means

The deployment story is fully automated: tofu apply creates everything, bin/build pushes images, one more tofu apply deploys
The pipeline-dag framework is validated end-to-end: from agency pipeline folder → container → gRPC → control console
The control console can render the DAG, show pipeline status, and trigger runs today — the UI just needs to be built on top of these APIs
Service-to-service auth patterns are proven and documented for when we add more services

Open Questions

Should pipeline containers be worker pools instead of services? Cloud Run recently added worker pools — a resource type with no public endpoint, fixed instance count (no cold starts), and VPC-native networking. Our pipeline container is a pure compute worker that shouldn't reach the internet at all. Worker pools would eliminate the IAM auth workaround, the cold-start problem, and the public-endpoint exposure in one move. See the host-orchestration experiment spec for the proposed follow-up.
Who polls the feeds — the host or the pipeline? We think the host should be the orchestrator: it polls feed sources on a schedule, then dispatches transform work to pipeline containers via gRPC. Pipelines receive data in, return results out, never touch the internet. This matters because we'll have a fixed number of feed sources but potentially many pipeline configurations per feed (different environments, different transform sets). The host handles the small N (feeds), the workers handle the variable M (configs).
How do we handle image tagging for production? We're using :latest for the experiment. Production will need immutable tags (git SHA or semantic version) and a strategy for rolling back.