Skip to content

Continuous GTFS — Architecture Overview

System Architecture

Three Cloud Run services plus managed infrastructure, all on Google Cloud Platform.

flowchart TB subgraph Internet Consumers[Feed Consumers] Operators[Agency Operators] end subgraph GCP["Google Cloud Platform"] subgraph Public CDN[Cloud CDN + LB] Web[Web Service<br/>Bun + Fastify<br/>REST API] end subgraph VPC["Private VPC"] Orch[Orchestrator<br/>Bun + gRPC<br/>Asset Registry] subgraph Workers["Pipeline Workers (1 per version)"] W1[pipeline:7f3a<br/>production] W2[pipeline:c91e<br/>staging] Wn[pipeline:...<br/>dev, feature branches, etc.] end DB[(Cloud SQL<br/>PostgreSQL 18)] end GCS[Cloud Storage<br/>Published Feeds] SM[Secret Manager<br/>API Tokens] end subgraph External["External Data Sources"] PIMS_Prod[PIMS Production<br/>api.soundtransit.org] PIMS_QA[PIMS QA<br/>api-beta.soundtransit.org] end Consumers -->|HTTPS| CDN CDN -->|origin| GCS Operators -->|HTTPS| Web Web -->|gRPC + IAM auth| Orch W1 -->|gRPC outbound| Orch W2 -->|gRPC outbound| Orch Wn -.->|gRPC outbound| Orch Orch -->|SQL| DB Orch -->|write| GCS Orch -->|fetch secrets| SM Orch -->|fetch feeds| PIMS_Prod Orch -->|fetch feeds| PIMS_QA
Component Responsibilities
Web Service Public REST API for management UI. Stateless gateway — forwards all operations to orchestrator over gRPC.
Orchestrator Asset registry (tracking data sources + versions), scheduled feed fetching with secret resolution, in-memory caching, per-pipeline debounce, worker pool tracking, pipeline dispatch, output registration, GCS publishing.
Pipeline Workers Stateless transform execution. Receive feed data, run schedule or RT transforms, stream output artifacts back. No internet access, no state, no scheduling.
Cloud SQL Asset definitions, version history, pipeline slot bindings, run records.
Cloud Storage Published feeds (CDN origin), asset version persistence.
CDN + Load Balancer Feed serving to consumers with 1-second cache TTL for RT, longer for schedule.
Secret Manager PIMS API tokens, resolved by orchestrator at fetch time.

Network Architecture and Security Boundaries

flowchart LR subgraph Public Internet Users[API Clients] FeedConsumers[Feed Consumers] end subgraph Cloud Run Public Web[Web Service<br/>port 8080 HTTP<br/>public ingress] end subgraph Cloud Run Internal Orch[Orchestrator<br/>port 8080 gRPC h2c<br/>internal-only ingress] end subgraph Cloud Run Worker Pools W1[pipeline:7f3a<br/>no ingress<br/>no internet egress] W2[pipeline:c91e<br/>no ingress<br/>no internet egress] Wn[pipeline:...<br/>N versions] end subgraph VPC Network Connector[VPC Connector<br/>10.0.1.0/28] SQL[(Cloud SQL<br/>10.15.0.3<br/>private IP only)] end subgraph Google APIs SM[Secret Manager] GCS[Cloud Storage] CDN[CDN + LB] end Users -->|HTTPS| Web FeedConsumers -->|HTTPS| CDN CDN -->|origin| GCS Web -->|VPC connector<br/>IAM ID token| Orch W1 -->|VPC connector<br/>IAM ID token| Orch W2 -->|VPC connector<br/>IAM ID token| Orch Wn -->|VPC connector<br/>IAM ID token| Orch Orch -->|private IP via VPC| SQL Orch -->|public internet| SM Orch -->|public internet| GCS

Security Boundaries

Boundary Enforcement Details
Internet → Web Cloud Run IAM Public access (allUsers invoker)
Web → Orchestrator IAM + VPC Web's service account has run.invoker role. Traffic routes through VPC connector. Orchestrator rejects non-VPC traffic.
Workers → Orchestrator IAM + VPC Worker service account has run.invoker role. Workers connect outbound through VPC.
Orchestrator → Cloud SQL VPC private IP Database only accessible on VPC. No public IP.
Orchestrator → Secret Manager IAM Service account has secretmanager.secretAccessor role.
Orchestrator → PIMS Bearer token Tokens stored in Secret Manager, resolved at runtime via ${secret} templates.
Workers → Internet Blocked Worker pools have no internet egress route. All data comes from orchestrator via gRPC.

Data Flow — Realtime Feeds

sequenceDiagram participant PIMS as PIMS ESB participant SM as Secret Manager participant Orch as Orchestrator participant Cache as In-Memory Cache participant DB as PostgreSQL participant W1 as Worker (pipeline:7f3a) participant W2 as Worker (pipeline:c91e) participant GCS as Cloud Storage participant CDN as CDN Note over Orch: Every ~20 seconds Orch->>SM: Resolve ${secret} for pims-qa SM-->>Orch: Bearer token Orch->>PIMS: GET /VehiclePosition (Bearer auth) PIMS-->>Orch: Protobuf bytes (6KB) Orch->>Cache: Store in memory Orch->>DB: Insert asset_version (timestamp) Note over Orch: Debounce 500ms, then fan out par Production pipeline Orch->>W1: ExecuteTransform (slot data from cache) W1-->>Orch: Stream feed.pb (passthrough) W1-->>Orch: Stream feed.json Orch->>DB: Register output asset Orch->>GCS: Publish production/vehicle_positions.pb and Staging pipeline Orch->>W2: ExecuteTransform (slot data from cache) W2-->>Orch: Stream feed.pb (renamed vehicles, filtered stops) W2-->>Orch: Stream feed.json Orch->>DB: Register output asset Orch->>GCS: Publish staging/vehicle_positions.pb end CDN->>GCS: Origin pull (max-age=1) Note over CDN: Consumer reads within ~112ms

Data Flow — Schedule Feeds

sequenceDiagram participant PIMS as PIMS ESB participant Orch as Orchestrator participant Cache as In-Memory Cache participant DB as PostgreSQL participant W as Worker participant GCS as Cloud Storage Note over Orch: Periodic or on push Orch->>PIMS: GET /schedule (Bearer auth) PIMS-->>Orch: ZIP bytes (1.5MB) Orch->>Orch: SHA-256 hash content alt Hash matches latest version Note over Orch: Skip — feed unchanged else New content Orch->>Cache: Store in memory Orch->>DB: Insert asset_version (content_hash) Orch->>W: ExecuteTransform (schedule zip from cache) W-->>Orch: Stream schedule.zip (transformed) Orch->>DB: Register output asset Orch->>GCS: Publish production/schedule.zip Note over Orch: Output available as input<br/>to RT pipeline for<br/>schedule-dependent transforms end

Code Flow — Pipeline Versioning

flowchart TB subgraph Agency Repo Code[Pipeline Code<br/>Python loose files] end subgraph CI/CD Hash[Compute tree hash<br/>of pipeline folder] Build[Build container image] Push[Push to Artifact Registry<br/>tag = tree hash] end subgraph GCP AR[Artifact Registry<br/>pipeline:7f3a...<br/>pipeline:c91e...<br/>pipeline:...] WP1[Worker Pool<br/>pipeline-7f3a<br/>→ production env] WP2[Worker Pool<br/>pipeline-c91e<br/>→ staging env] WPn[Worker Pool<br/>pipeline-...<br/>→ dev env, etc.] Orch[Orchestrator<br/>tracks connected versions] end Code -->|git push| Hash Hash -->|content-addressed| Build Build --> Push Push --> AR AR -->|deploy| WP1 AR -->|deploy| WP2 AR -.->|deploy| WPn WP1 -->|register 7f3a| Orch WP2 -->|register c91e| Orch WPn -.->|register ...| Orch style WP1 fill:#d4edda style WP2 fill:#fff3cd style WPn fill:#f0f0f0,stroke-dasharray: 5 5

Key Principles

  • Pipeline version = the transforms. No runtime config selection. Different environments run different container images with different code baked in.
  • Content-addressed images. Same pipeline code from different branches produces the same image tag. No unnecessary rebuilds.
  • Instant rollback. Point an environment to a previous tree hash. Old images stay in Artifact Registry.

Environment Configuration

An environment ties together a pipeline version, input data sources, and an output destination. The orchestrator manages these bindings in the asset registry.

flowchart LR subgraph "Production Environment" direction TB PV1[Pipeline: 7f3a\npassthrough transforms] PI1[Inputs:\npims/prod/vehicle_positions\npims/prod/trip_updates\npims/prod/schedule] PO1[Outputs:\noutput/st-realtime-production/\noutput/st-schedule-production/] PI1 --> PV1 --> PO1 end subgraph "Staging Environment" direction TB PV2[Pipeline: c91e\nrename + filter transforms] PI2[Inputs:\npims/qa/vehicle_positions\npims/qa/trip_updates\npims/qa/schedule] PO2[Outputs:\noutput/st-realtime-staging/\noutput/st-schedule-staging/] PI2 --> PV2 --> PO2 end subgraph "Dev Environment (example)" direction TB PV3[Pipeline: a3b4\nbranch build] PI3[Inputs:\npims/qa/*\nor custom push] PO3[Outputs:\noutput/st-realtime-dev-chris/] PI3 --> PV3 --> PO3 end PIMS_Prod[PIMS Prod API\npims-prod secret] --> PI1 PIMS_QA[PIMS QA API\npims-qa secret] --> PI2 PIMS_QA --> PI3

Each environment is defined by three things in the orchestrator's database:

Property Production Staging Dev (example)
Pipeline version 7f3a (worker pool instance) c91e (worker pool instance) a3b4 (worker pool, min_instances=0)
Input assets (slot bindings) pims/prod/* pims/qa/* pims/qa/* or push
Output path output/st-*-production/ output/st-*-staging/ output/st-*-dev-chris/

Note

For illustrative purposes the diagram shows staging and dev environments consuming PIMS QA feeds. In practice, Sound Transit's discovery process resolved that all pipeline environments would generally consume pims/prod feeds.

  • Adding an environment = deploy a new worker pool + configure asset slot bindings
  • Changing a pipeline version = deploy new worker pool with updated image, update environment mapping
  • Removing an environment = delete the worker pool (or scale to 0 for dev environments)
  • Inputs are independent — production reads from PIMS production feeds with the pims-prod secret, staging reads from PIMS QA feeds with the pims-qa secret. A single fetch fans out to all environments that consume that asset.

Component Responsibilities

Component Owns Does NOT own
Web REST API, health endpoint State, data, scheduling
Orchestrator Asset registry, versioning, caching, debounce, worker pool tracking, dispatch, output registration, GCS publish, secret resolution Transform logic
Workers Transform execution (schedule + RT), streaming output Scheduling, fetching, state, internet access
Cloud SQL Asset definitions, versions, pipeline slots, run history Feed content (in-memory cache only)
GCS Published feeds (CDN origin), asset persistence Access control (CDN handles)
CDN Feed serving, caching, spike protection Data freshness (origin handles via Cache-Control)

Validated Performance

Metric Value
RT transform pipeline 10-23ms p95
Schedule transform pipeline 833-920ms total
CDN write-to-read latency 112ms p50
gRPC service-to-service 160ms p50
Content-hash dedup 100% accuracy
Debounce fan-out Correct to all consuming pipelines