in-progress May 15, 2026

Teradata → BigQuery migration assistant

LLM-assisted translator paired with a deterministic diff harness that validates row-level equivalence before cutover.

120+

models migrated

#teradata#bigquery#claude#migration

The problem

We’re in the middle of a company-wide migration off Teradata onto BigQuery. The reporting suite, the warehouse, and dozens of downstream pipelines all need to land cleanly on the other side. Some queries are small — a SELECT and a couple of joins. Others are 1,500-line dialect-soup with Teradata-specific functions, recursive CTEs, and BTEQ-flavored quirks.

Hand-porting that volume of SQL would burn months of analyst time and introduce silent regressions — because a SQL translation that runs without errors isn’t necessarily one that produces the same rows.

The approach

Two pieces, both deliberately simple:

1. An LLM-assisted translator. I feed Teradata SQL into Claude with a system prompt containing the dialect mapping rules — Teradata-isms that need rewriting (CAST quirks, QUALIFY clauses, MULTISET tables, TOP N → LIMIT, the works). The model returns BigQuery SQL plus a short rationale of the substitutions it made.

2. A deterministic diff harness. This is the part that actually matters. The harness runs the original Teradata query and the translated BigQuery query against representative data, then compares result sets row-by-row, column-by-column. If the diff is empty, the translation is safe to promote. If not, the harness emits a structured report — which rows differ, which columns, what the values look like on each side.

Why the diff harness mattered

LLMs are good at SQL. Mostly. The first version of this pipeline relied on the translator’s output plus a human eyeballing the result — a workflow that scales like sand through a sieve.

The harness moved us from “trust the LLM” to “verify the LLM.” That single shift unlocked the velocity. I can run the translator over a batch of 30 queries and the harness tells me unambiguously which ones are safe to promote and which need a second pass. The reviewer’s job becomes “look at the failing diffs” rather than “audit every line.”

What worked

Mid-complexity queries (joins, window functions, basic UDFs) translated cleanly on first pass roughly 90% of the time.
The harness caught NULL-handling edge cases and timezone divergences — silent bugs we wouldn’t have found in code review.
Chunking long queries by CTE before sending to the model made a measurable difference. Anything over ~1,000 lines became more reliable when fed in named blocks.

What didn’t

Recursive CTEs. Teradata’s recursive semantics don’t map cleanly to BigQuery in every case. We fell back to manual rewrites for these.
Stored procedures with control flow (LOOP, IF, BEGIN…END) are out of scope. The translator targets SELECTs, not procedural blocks.

What I’d do differently

Build the diff harness first. I built the translator first because it was the obvious “AI-shaped” piece. Two sprints in, we had a translator with no way to trust it — and we built the harness afterward, effectively redoing validation work. If I were starting again, the harness would come first, and the translator would slot into a workflow that already knew what “correct” looked like.

Status

In progress. 120+ models migrated through this pipeline so far. Several weeks of runway to clear the remaining queue.