in-progress May 15, 2026

AI-augmented ETL pipeline automation

Using Claude as a development co-pilot to automate a manual 9-step monthly data load — with built-in validation, error detection, and a parallel UAT environment.

9 steps

manual → single command

#claude#node.js#bteq#automation#etl

Context

A critical monthly ETL process populating a multi-million-row production forecast table required manual execution across 9 SQL steps, 14 regional input files, and multiple stakeholder dependencies. The process was undocumented, SME-dependent, and entirely manual.

The problem

The pipeline ran once per month, required manual SQL execution in a specific order, and had no automated validation, no error detection, and no logging. File naming varied each month. One departure could make the process unrunnable. A Reduction in Force eliminated the SME mid-year, making automation urgent.

Methodology

Phase 1 — Process archaeology

Mapped the full 9-step SQL execution chain:

input file ingestion → staging table loads → view refreshes → join logic → production load

Documented all table dependencies, DELETE/INSERT sequences, and critical sequencing constraints. Identified the “DELETE before INSERT” step as the highest-risk manual step — if missed, production data would double-load.

Phase 2 — AI-assisted script generation

Used Claude as a development partner to:

Generate BTEQ scripts for each pipeline step from documented SQL logic
Identify and resolve edge cases:
- BTEQ 16.x tab-delimiter incompatibilities (fixed by converting source files to pipe-delimited staging format)
- FLOAT vs. DECIMAL column type mismatches causing binary parcel size errors (fixed by querying DBC.ColumnsV metadata at runtime to generate type-correct USING clauses)
- ORDER BY constraints in outer single-column SELECT statements
Build a metadata-driven column-definition fetch step that queries the data warehouse at runtime — eliminating hardcoded type assumptions

Phase 3 — Orchestration & validation

Built a Node.js orchestrator that:

Executes BTEQ steps in sequence with exit-on-error
Validates input file presence and row counts before loading
Converts source files to BTEQ-compatible format automatically
Logs all steps with timestamps and row counts
Generates an HTML validation report comparing staging vs. production counts
Surfaces error lines from BTEQ logs for immediate review

Phase 4 — UAT environment

Created parallel UAT tables mirroring prod schema to run and validate the full pipeline without touching production. All fixes validated in UAT before any prod execution.

Stack

Node.js, BTEQ (Teradata), SQL, Claude (development co-pilot), Python.

Impact

Monthly manual 9-step process → single command execution
Built-in validation at every step; no silent failures
Zero production risk during development (full UAT environment)
Business continuity maintained solo following RIF headcount loss
First AI-native tooling adopted by the team