How Do We Know the RT Pipeline Didn't Break Anything?

March 20, 2026 — Experiment: rt-feed-comparison

The Question

When our pipeline processes a realtime GTFS feed — vehicle positions, trip updates, service alerts — how do we verify it didn't break anything? We can't just compare bytes, because protobuf serialization isn't deterministic and our pipeline will update timestamps. We need a comparison tool that understands what "equivalent" means for realtime transit data.

What We Tried

We built a comparison tool with four levels of equivalence, from strictest to most lenient:

Level 0: Byte-identical — exact same bytes (rare, mainly for sanity checks)
Level 1: Structurally identical — same parsed content in same order
Level 2: Semantically equivalent — same entities, ignoring order and timestamp differences
Level 3: Functionally equivalent — same meaningful content, ignoring stale vehicles that stopped reporting

What We Found

Sound Transit's trip_updates feed has duplicate entity IDs — two entities share the same ID in a 149-entity feed. Our tool catches this and warns about it. Most consumers silently use the last occurrence and never notice.
Entity ordering is genuinely random — re-serializing the same feed can reorder entities. Any comparison tool that doesn't handle this produces false alarms.
Geographic precision matters — vehicle positions are stored as float32, which means ~1 meter of precision. A position difference of 0.000003 degrees looks different in bytes but is the same location. Our tool rounds to 5 decimal places by default.
Stale entity handling needs to be configurable — some vehicles stop reporting but stay in the feed for minutes. Whether that's a "difference" depends on context. We classify them as "stale" separately from "removed."

What It Looks Like

Here's what our pipeline's comparison report looks like for a passthrough (no transforms applied):

GTFS-RT Comparison: vehicle_positions
Equivalence: Level 2 (semantically equivalent)
Header: timestamp delta 3s (within 30s tolerance), version match ✓

Entities: 42 vs 42
  ✓ 42 matched
  + 0 added
  - 0 removed

And here's what it looks like after applying a transform that renames vehicle IDs:

GTFS-RT Comparison: vehicle_positions
Equivalence: not equivalent
Header: timestamp delta 1s, version match ✓

Entities: 42 vs 42
  ✓ 0 matched
  ~ 42 modified
  + 0 added
  - 0 removed

Modifications:
  [~] v_101: vehicle.vehicle.id: "101" → "ST-101"
  [~] v_102: vehicle.vehicle.id: "102" → "ST-102"
  ...all 42 vehicles renamed as expected

The tool tells you exactly what changed — not just "different" but which entities, which fields, and what the old and new values are.

The Decision

Four-level equivalence with configurable tolerances. The tool is designed for visibility — it tells you what the pipeline did in every case, whether that's "nothing changed" or "here are the 3 specific fields that differ."

What This Means

Every pipeline run can be verified automatically: "did our transforms produce the expected changes and nothing else?"
Sound Transit can see exactly what our system does to their feeds before going live
Duplicate entity IDs and stale vehicles are surfaced, not hidden — giving operators full visibility
Milestone validation is concrete: "here's the comparison report proving functional equivalence"

Open Questions

Should comparison run continuously or on-demand? We could compare every pipeline cycle (every 20s) and alert on unexpected differences, or run it as a validation step during deployment and version changes. Continuous comparison adds monitoring overhead but catches regressions immediately.
How do we handle DIFFERENTIAL incrementality? Sound Transit currently uses FULL_DATASET mode, but if they switch to DIFFERENTIAL, "missing entity" means "unchanged" not "removed." The comparison tool supports both but we haven't tested against a real DIFFERENTIAL feed.