Skip to content

How Do We Know the RT Pipeline Didn't Break Anything?

March 20, 2026 — Experiment: rt-feed-comparison

The Question

When our pipeline processes a realtime GTFS feed — vehicle positions, trip updates, service alerts — how do we verify it didn't break anything? We can't just compare bytes, because protobuf serialization isn't deterministic and our pipeline will update timestamps. We need a comparison tool that understands what "equivalent" means for realtime transit data.

What We Tried

We built a comparison tool with four levels of equivalence, from strictest to most lenient:

  • Level 0: Byte-identical — exact same bytes (rare, mainly for sanity checks)
  • Level 1: Structurally identical — same parsed content in same order
  • Level 2: Semantically equivalent — same entities, ignoring order and timestamp differences
  • Level 3: Functionally equivalent — same meaningful content, ignoring stale vehicles that stopped reporting

What We Found

  1. Sound Transit's trip_updates feed has duplicate entity IDs — two entities share the same ID in a 149-entity feed. Our tool catches this and warns about it. Most consumers silently use the last occurrence and never notice.

  2. Entity ordering is genuinely random — re-serializing the same feed can reorder entities. Any comparison tool that doesn't handle this produces false alarms.

  3. Geographic precision matters — vehicle positions are stored as float32, which means ~1 meter of precision. A position difference of 0.000003 degrees looks different in bytes but is the same location. Our tool rounds to 5 decimal places by default.

  4. Stale entity handling needs to be configurable — some vehicles stop reporting but stay in the feed for minutes. Whether that's a "difference" depends on context. We classify them as "stale" separately from "removed."

What It Looks Like

Here's what our pipeline's comparison report looks like for a passthrough (no transforms applied):

GTFS-RT Comparison: vehicle_positions
Equivalence: Level 2 (semantically equivalent)
Header: timestamp delta 3s (within 30s tolerance), version match ✓

Entities: 42 vs 42
  ✓ 42 matched
  + 0 added
  - 0 removed

And here's what it looks like after applying a transform that renames vehicle IDs:

GTFS-RT Comparison: vehicle_positions
Equivalence: not equivalent
Header: timestamp delta 1s, version match ✓

Entities: 42 vs 42
  ✓ 0 matched
  ~ 42 modified
  + 0 added
  - 0 removed

Modifications:
  [~] v_101: vehicle.vehicle.id: "101" → "ST-101"
  [~] v_102: vehicle.vehicle.id: "102" → "ST-102"
  ...all 42 vehicles renamed as expected

The tool tells you exactly what changed — not just "different" but which entities, which fields, and what the old and new values are.

The Decision

Four-level equivalence with configurable tolerances. The tool is designed for visibility — it tells you what the pipeline did in every case, whether that's "nothing changed" or "here are the 3 specific fields that differ."

What This Means

  • Every pipeline run can be verified automatically: "did our transforms produce the expected changes and nothing else?"
  • Sound Transit can see exactly what our system does to their feeds before going live
  • Duplicate entity IDs and stale vehicles are surfaced, not hidden — giving operators full visibility
  • Milestone validation is concrete: "here's the comparison report proving functional equivalence"

Open Questions

  • Should comparison run continuously or on-demand? We could compare every pipeline cycle (every 20s) and alert on unexpected differences, or run it as a validation step during deployment and version changes. Continuous comparison adds monitoring overhead but catches regressions immediately.
  • How do we handle DIFFERENTIAL incrementality? Sound Transit currently uses FULL_DATASET mode, but if they switch to DIFFERENTIAL, "missing entity" means "unchanged" not "removed." The comparison tool supports both but we haven't tested against a real DIFFERENTIAL feed.