How Do We Know the RT Pipeline Didn't Break Anything?
March 20, 2026 — Experiment: rt-feed-comparison
The Question
When our pipeline processes a realtime GTFS feed — vehicle positions, trip updates, service alerts — how do we verify it didn't break anything? We can't just compare bytes, because protobuf serialization isn't deterministic and our pipeline will update timestamps. We need a comparison tool that understands what "equivalent" means for realtime transit data.
What We Tried
We built a comparison tool with four levels of equivalence, from strictest to most lenient:
- Level 0: Byte-identical — exact same bytes (rare, mainly for sanity checks)
- Level 1: Structurally identical — same parsed content in same order
- Level 2: Semantically equivalent — same entities, ignoring order and timestamp differences
- Level 3: Functionally equivalent — same meaningful content, ignoring stale vehicles that stopped reporting
What We Found
-
Sound Transit's trip_updates feed has duplicate entity IDs — two entities share the same ID in a 149-entity feed. Our tool catches this and warns about it. Most consumers silently use the last occurrence and never notice.
-
Entity ordering is genuinely random — re-serializing the same feed can reorder entities. Any comparison tool that doesn't handle this produces false alarms.
-
Geographic precision matters — vehicle positions are stored as float32, which means ~1 meter of precision. A position difference of 0.000003 degrees looks different in bytes but is the same location. Our tool rounds to 5 decimal places by default.
-
Stale entity handling needs to be configurable — some vehicles stop reporting but stay in the feed for minutes. Whether that's a "difference" depends on context. We classify them as "stale" separately from "removed."
What It Looks Like
Here's what our pipeline's comparison report looks like for a passthrough (no transforms applied):
GTFS-RT Comparison: vehicle_positions
Equivalence: Level 2 (semantically equivalent)
Header: timestamp delta 3s (within 30s tolerance), version match ✓
Entities: 42 vs 42
✓ 42 matched
+ 0 added
- 0 removed
And here's what it looks like after applying a transform that renames vehicle IDs:
GTFS-RT Comparison: vehicle_positions
Equivalence: not equivalent
Header: timestamp delta 1s, version match ✓
Entities: 42 vs 42
✓ 0 matched
~ 42 modified
+ 0 added
- 0 removed
Modifications:
[~] v_101: vehicle.vehicle.id: "101" → "ST-101"
[~] v_102: vehicle.vehicle.id: "102" → "ST-102"
...all 42 vehicles renamed as expected
The tool tells you exactly what changed — not just "different" but which entities, which fields, and what the old and new values are.
The Decision
Four-level equivalence with configurable tolerances. The tool is designed for visibility — it tells you what the pipeline did in every case, whether that's "nothing changed" or "here are the 3 specific fields that differ."
What This Means
- Every pipeline run can be verified automatically: "did our transforms produce the expected changes and nothing else?"
- Sound Transit can see exactly what our system does to their feeds before going live
- Duplicate entity IDs and stale vehicles are surfaced, not hidden — giving operators full visibility
- Milestone validation is concrete: "here's the comparison report proving functional equivalence"
Open Questions
- Should comparison run continuously or on-demand? We could compare every pipeline cycle (every 20s) and alert on unexpected differences, or run it as a validation step during deployment and version changes. Continuous comparison adds monitoring overhead but catches regressions immediately.
- How do we handle DIFFERENTIAL incrementality? Sound Transit currently uses FULL_DATASET mode, but if they switch to DIFFERENTIAL, "missing entity" means "unchanged" not "removed." The comparison tool supports both but we haven't tested against a real DIFFERENTIAL feed.