AI-augmented ETL pipeline automation
Using Claude as a development co-pilot to automate a manual 9-step monthly data load — with built-in validation, error detection, and a parallel UAT environment.
Context
A critical monthly ETL process populating a multi-million-row production forecast table required manual execution across 9 SQL steps, 14 regional input files, and multiple stakeholder dependencies. The process was undocumented, SME-dependent, and entirely manual.
The problem
The pipeline ran once per month, required manual SQL execution in a specific order, and had no automated validation, no error detection, and no logging. File naming varied each month. One departure could make the process unrunnable. A Reduction in Force eliminated the SME mid-year, making automation urgent.
Methodology
Phase 1 — Process archaeology
Mapped the full 9-step SQL execution chain:
input file ingestion → staging table loads → view refreshes → join logic → production load
Documented all table dependencies, DELETE/INSERT sequences, and critical sequencing constraints. Identified the “DELETE before INSERT” step as the highest-risk manual step — if missed, production data would double-load.
Phase 2 — AI-assisted script generation
Used Claude as a development partner to:
- Generate BTEQ scripts for each pipeline step from documented SQL logic
- Identify and resolve edge cases:
- BTEQ 16.x tab-delimiter incompatibilities (fixed by converting source files to pipe-delimited staging format)
- FLOAT vs. DECIMAL column type mismatches causing binary parcel size errors (fixed by querying
DBC.ColumnsVmetadata at runtime to generate type-correctUSINGclauses) - ORDER BY constraints in outer single-column SELECT statements
- Build a metadata-driven column-definition fetch step that queries the data warehouse at runtime — eliminating hardcoded type assumptions
Phase 3 — Orchestration & validation
Built a Node.js orchestrator that:
- Executes BTEQ steps in sequence with exit-on-error
- Validates input file presence and row counts before loading
- Converts source files to BTEQ-compatible format automatically
- Logs all steps with timestamps and row counts
- Generates an HTML validation report comparing staging vs. production counts
- Surfaces error lines from BTEQ logs for immediate review
Phase 4 — UAT environment
Created parallel UAT tables mirroring prod schema to run and validate the full pipeline without touching production. All fixes validated in UAT before any prod execution.
Stack
Node.js, BTEQ (Teradata), SQL, Claude (development co-pilot), Python.
Impact
- Monthly manual 9-step process → single command execution
- Built-in validation at every step; no silent failures
- Zero production risk during development (full UAT environment)
- Business continuity maintained solo following RIF headcount loss
- First AI-native tooling adopted by the team