← All projects
in-progress May 15, 2026

AI-augmented ETL pipeline automation

Using Claude as a development co-pilot to automate a manual 9-step monthly data load — with built-in validation, error detection, and a parallel UAT environment.

9 steps
manual → single command
#claude#node.js#bteq#automation#etl

Context

A critical monthly ETL process populating a multi-million-row production forecast table required manual execution across 9 SQL steps, 14 regional input files, and multiple stakeholder dependencies. The process was undocumented, SME-dependent, and entirely manual.

The problem

The pipeline ran once per month, required manual SQL execution in a specific order, and had no automated validation, no error detection, and no logging. File naming varied each month. One departure could make the process unrunnable. A Reduction in Force eliminated the SME mid-year, making automation urgent.

Methodology

Phase 1 — Process archaeology

Mapped the full 9-step SQL execution chain:

input file ingestion → staging table loads → view refreshes → join logic → production load

Documented all table dependencies, DELETE/INSERT sequences, and critical sequencing constraints. Identified the “DELETE before INSERT” step as the highest-risk manual step — if missed, production data would double-load.

Phase 2 — AI-assisted script generation

Used Claude as a development partner to:

  • Generate BTEQ scripts for each pipeline step from documented SQL logic
  • Identify and resolve edge cases:
    • BTEQ 16.x tab-delimiter incompatibilities (fixed by converting source files to pipe-delimited staging format)
    • FLOAT vs. DECIMAL column type mismatches causing binary parcel size errors (fixed by querying DBC.ColumnsV metadata at runtime to generate type-correct USING clauses)
    • ORDER BY constraints in outer single-column SELECT statements
  • Build a metadata-driven column-definition fetch step that queries the data warehouse at runtime — eliminating hardcoded type assumptions

Phase 3 — Orchestration & validation

Built a Node.js orchestrator that:

  • Executes BTEQ steps in sequence with exit-on-error
  • Validates input file presence and row counts before loading
  • Converts source files to BTEQ-compatible format automatically
  • Logs all steps with timestamps and row counts
  • Generates an HTML validation report comparing staging vs. production counts
  • Surfaces error lines from BTEQ logs for immediate review

Phase 4 — UAT environment

Created parallel UAT tables mirroring prod schema to run and validate the full pipeline without touching production. All fixes validated in UAT before any prod execution.

Stack

Node.js, BTEQ (Teradata), SQL, Claude (development co-pilot), Python.

Impact

  • Monthly manual 9-step process → single command execution
  • Built-in validation at every step; no silent failures
  • Zero production risk during development (full UAT environment)
  • Business continuity maintained solo following RIF headcount loss
  • First AI-native tooling adopted by the team