Real Feeds Are Flowing
March 26, 2026 — Phase 4 Build
The Question
We have infrastructure deployed and transforms written — but does the whole thing actually work end-to-end? Can we get real Sound Transit feeds flowing from PIMS through the pipeline and out the other side as transformed GTFS-RT that consumers can hit?
What We Tried
- Copied PIMS API tokens from the experiments project into the production Secret Manager
- Seeded 6 PIMS assets (prod + QA: VehiclePositions, TripUpdates, schedule) with fetch schedules, auth headers, and version strategies
- Configured pipeline slot bindings: RT pipeline consumes VP + TU, schedule pipeline consumes the schedule feed
- Built the Sound Transit transforms as a separate agency repo (
sound-transit-gtfs-pipelines) that depends on thecontinuous-gtfsframework as a library - Deployed the pipeline worker image with baked-in transforms to the Cloud Run worker pool
What We Found
-
The worker was completing runs in 0.6ms but producing no output. It received raw protobuf bytes in the slot data but passed them directly to transforms that expected a decoded FeedMessage. The transforms silently did nothing on raw bytes — no error, no crash. We had to add a decode step in the worker that merges VP + TU into a single FeedMessage before running the DAG.
-
Service-to-service auth was the missing piece for production. The worker connected fine locally but couldn't reach the Cloud Run orchestrator — it needs TLS and an ID token from the metadata server. The fix was 20 lines of code (exactly as the deployment spec predicted), but the token had to auto-refresh via a gRPC AuthMetadataPlugin or workers running longer than an hour would lose auth.
-
The agency repo needs its own container image. Transforms are "baked in" per the spec — the pipeline version IS the container image. The framework repo provides the base (
continuous-gtfs), the agency repo adds its transforms and builds on top. This forced us to solve a Docker build context problem: the local path dependency can't be resolved inside Docker, so the build script copies the framework in temporarily. -
Schedule feed dedup works perfectly. The content-hash strategy catches identical schedule exports from PIMS — "content unchanged (dedup)" appears in the logs every 5 minutes. No unnecessary pipeline runs.
-
VP + TU debounce collapses correctly. Both feeds arrive within milliseconds of each other every 20 seconds. The 2-second debounce window collapses them into a single pipeline run that processes both in one pass.
What It Looks Like
Real PIMS data flowing through the system every 20 seconds:
Fetch pims/production/vehicle_positions: new version 2026-03-26T0... (12838 bytes, 29ms)
Fetch pims/production/trip_updates: new version 2026-03-26T0... (23948 bytes, 37ms)
Dispatch: run=8d62190f pipeline=st-realtime-production feed_type=realtime slots=['trip_updates', 'vehicle_positions']
Run 8d62190f completed in 1.5ms
Published production/feed.pb (23587 bytes, public, max-age=1)
Published production/feed.json (147847 bytes, public, max-age=1)
Registered output asset output/st-realtime-production/feed.pb version cbf50c67d86e...
The transformed feed, served via CDN:
$ curl -s http://34.36.64.110/production/feed.json | jq '.header'
{
"gtfs_realtime_version": "2.0",
"timestamp": "1774498531"
}
$ curl -s -I http://34.36.64.110/production/feed.pb | grep Cache-Control
Cache-Control: public, max-age=1
Note
The gtfs_realtime_version: "2.0" in the header is our UpdateFeedHeader transform at work — PIMS sends version 1.0.
The Decision
The pipeline is live and processing real Sound Transit feeds end-to-end. PIMS data is fetched every 20 seconds, transformed in <2ms, and published to CDN with 1-second cache headers — matching the latency and staleness targets from our experiments.
What This Means
- Consumers can hit
http://34.36.64.110/production/feed.pbfor transformed protobuf orfeed.jsonfor JSON — right now, today - The agency repo pattern works — Sound Transit's transforms live in their own repo, built as a separate Docker image
- Schedule dedup prevents unnecessary reprocessing when PIMS exports haven't changed
- Output assets are registered in the registry, closing the loop for schedule→RT dependency (when schedule transforms are wired up)
- The system has processed thousands of runs already with no errors
Open Questions
- DNS and HTTPS for the CDN endpoint — currently HTTP on a raw IP
- The schedule pipeline isn't running yet (only RT) — needs a schedule-specific worker or the same worker with
PIPELINE_DIRpointed at the schedule folder - How should the agency repo's CI build and push images? Currently using a manual
bin/buildscript - Should we publish
continuous-gtfsas a real Python package to avoid the Docker build workaround?