Skip to content

Real Feeds Are Flowing

March 26, 2026 — Phase 4 Build

The Question

We have infrastructure deployed and transforms written — but does the whole thing actually work end-to-end? Can we get real Sound Transit feeds flowing from PIMS through the pipeline and out the other side as transformed GTFS-RT that consumers can hit?

What We Tried

  • Copied PIMS API tokens from the experiments project into the production Secret Manager
  • Seeded 6 PIMS assets (prod + QA: VehiclePositions, TripUpdates, schedule) with fetch schedules, auth headers, and version strategies
  • Configured pipeline slot bindings: RT pipeline consumes VP + TU, schedule pipeline consumes the schedule feed
  • Built the Sound Transit transforms as a separate agency repo (sound-transit-gtfs-pipelines) that depends on the continuous-gtfs framework as a library
  • Deployed the pipeline worker image with baked-in transforms to the Cloud Run worker pool

What We Found

  1. The worker was completing runs in 0.6ms but producing no output. It received raw protobuf bytes in the slot data but passed them directly to transforms that expected a decoded FeedMessage. The transforms silently did nothing on raw bytes — no error, no crash. We had to add a decode step in the worker that merges VP + TU into a single FeedMessage before running the DAG.

  2. Service-to-service auth was the missing piece for production. The worker connected fine locally but couldn't reach the Cloud Run orchestrator — it needs TLS and an ID token from the metadata server. The fix was 20 lines of code (exactly as the deployment spec predicted), but the token had to auto-refresh via a gRPC AuthMetadataPlugin or workers running longer than an hour would lose auth.

  3. The agency repo needs its own container image. Transforms are "baked in" per the spec — the pipeline version IS the container image. The framework repo provides the base (continuous-gtfs), the agency repo adds its transforms and builds on top. This forced us to solve a Docker build context problem: the local path dependency can't be resolved inside Docker, so the build script copies the framework in temporarily.

  4. Schedule feed dedup works perfectly. The content-hash strategy catches identical schedule exports from PIMS — "content unchanged (dedup)" appears in the logs every 5 minutes. No unnecessary pipeline runs.

  5. VP + TU debounce collapses correctly. Both feeds arrive within milliseconds of each other every 20 seconds. The 2-second debounce window collapses them into a single pipeline run that processes both in one pass.

What It Looks Like

Real PIMS data flowing through the system every 20 seconds:

Fetch pims/production/vehicle_positions: new version 2026-03-26T0... (12838 bytes, 29ms)
Fetch pims/production/trip_updates: new version 2026-03-26T0... (23948 bytes, 37ms)
Dispatch: run=8d62190f pipeline=st-realtime-production feed_type=realtime slots=['trip_updates', 'vehicle_positions']
Run 8d62190f completed in 1.5ms
Published production/feed.pb (23587 bytes, public, max-age=1)
Published production/feed.json (147847 bytes, public, max-age=1)
Registered output asset output/st-realtime-production/feed.pb version cbf50c67d86e...

The transformed feed, served via CDN:

$ curl -s http://34.36.64.110/production/feed.json | jq '.header'
{
  "gtfs_realtime_version": "2.0",
  "timestamp": "1774498531"
}

$ curl -s -I http://34.36.64.110/production/feed.pb | grep Cache-Control
Cache-Control: public, max-age=1

Note

The gtfs_realtime_version: "2.0" in the header is our UpdateFeedHeader transform at work — PIMS sends version 1.0.

The Decision

The pipeline is live and processing real Sound Transit feeds end-to-end. PIMS data is fetched every 20 seconds, transformed in <2ms, and published to CDN with 1-second cache headers — matching the latency and staleness targets from our experiments.

What This Means

  • Consumers can hit http://34.36.64.110/production/feed.pb for transformed protobuf or feed.json for JSON — right now, today
  • The agency repo pattern works — Sound Transit's transforms live in their own repo, built as a separate Docker image
  • Schedule dedup prevents unnecessary reprocessing when PIMS exports haven't changed
  • Output assets are registered in the registry, closing the loop for schedule→RT dependency (when schedule transforms are wired up)
  • The system has processed thousands of runs already with no errors

Open Questions

  • DNS and HTTPS for the CDN endpoint — currently HTTP on a raw IP
  • The schedule pipeline isn't running yet (only RT) — needs a schedule-specific worker or the same worker with PIPELINE_DIR pointed at the schedule folder
  • How should the agency repo's CI build and push images? Currently using a manual bin/build script
  • Should we publish continuous-gtfs as a real Python package to avoid the Docker build workaround?