From SSIS to dbt in a Weekend: The Agentic Harness Migration Pattern
The 6-month consulting migration is over. A working playbook for translating SSIS packages to dbt models using deterministic parsing, AI-assisted rewriting, and hash-diff parity checking.
Hundreds of thousands of production SSIS packages still run in Fortune 500 data estates in 2026. Health insurance carriers depend on them for claims reporting. Banks depend on them for regulatory feeds. Retail and CPG companies depend on them for nightly inventory reconciliation. The packages are not glamorous. They are critical. And until very recently, modernizing them required the most expensive shape of consulting engagement available in the data ecosystem.
The standard SSIS-to-dbt migration runs four to six months of senior consultant time, costs between $200,000 and $300,000 for a mid-sized estate of 50 packages, and concludes with a parity-validation phase that consumes more budget than translation itself. The economics work for consultancies. They do not work for the companies paying the bill.
The agentic harness pattern reshapes this economics. A 10-package migration that used to consume a four-week senior-engineer sprint now completes in a single weekend with deterministic parsing, AI-assisted SQL translation, and hash-diff parity validation. Cost reductions in the 70 to 90 percent range are credible. The migration is no longer a budget event. It is an engineering decision.
This playbook documents the working pattern.
Why SSIS Persists in 2026
A brief context note for readers outside the legacy data world. SSIS is SQL Server Integration Services, Microsoft’s ETL tool introduced in 2005. Dataflows are defined in XML, edited through a GUI in Visual Studio, and tightly coupled to SQL Server as both source and target system. The toolchain pre-dates version control as a meaningful development practice. Most SSIS packages in production were authored before Git became the universal default.
Three properties make SSIS estates uniquely difficult to modernize. First, the XML format is unwieldy for diff-based review, which makes the code resistant to incremental refactoring. Second, the GUI-driven authoring pattern leaves behind packages with cryptic auto-generated names (COL_1, Out_3) that obscure intent. Third, business logic is often distributed across SSIS dataflows, T-SQL stored procedures, and external SQL Server jobs, making the full transformation graph opaque without manual archaeology.
Most SSIS estates were inherited rather than chosen. The teams running them have a clear mandate to migrate but a justifiable hesitation about the cost. New analytics projects rarely start on SSIS in 2026. The problem is the existing estate, holding the reporting, finance close, and regulatory feeds that the business runs on.
Why dbt Is the Target
The case for dbt as the migration target is well-established by 2026. dbt’s transformation model (modular SQL with Jinja templating, version-controlled, test-driven, materialized into a configurable warehouse) inverts every pain point of SSIS. The dbt ecosystem (lineage tools, semantic layer, materialization options, the recently-GA Fusion engine, the Power User VS Code extension) means the migration investment compounds over time rather than locking the team into another proprietary tool.
The underlying warehouse migration (typically from on-premises SQL Server to Snowflake, Databricks, or BigQuery) is usually scoped alongside the SSIS-to-dbt translation. The combination delivers both modernized transformation logic and modernized compute, which materially reduces total cost of ownership against the on-premises SQL Server estate.
The strategic point is that lift-and-shift translation is the wrong goal. The right goal is rebuild-and-modernize, with the dbt project layout (staging, intermediate, marts) explicitly designed to be more maintainable than the SSIS originals. The agent makes the rebuild affordable enough that this becomes the practical default.
The Traditional Migration Playbook
For context on what the new pattern is replacing, the traditional consulting playbook runs through four phases.
Discovery phase (weeks 1 to 4). A senior consultant reads every SSIS package. The output is a spreadsheet cataloging source connections, transformations, target tables, and dependencies for each package. A typical senior engineer working manually clocks 5 to 10 packages per week.
Translation phase (weeks 5 to 16). A senior engineer rewrites each package as dbt models. Staging models for raw extracts. Intermediate models for transformation steps. Mart models for final outputs. A senior engineer clocks 3 to 5 packages per week including testing.
Validation phase (weeks 17 to 22). The team runs the new dbt models against historical SSIS output, compares row counts and key aggregates, and chases the diffs. This phase usually runs longer than estimated because subtle SSIS behaviors (timezone handling, null semantics, type conversions) surface only on real data.
Cutover phase (weeks 23 to 26). Downstream consumers (BI dashboards, downstream SQL Server processes, external feeds) get switched from SSIS to dbt. SSIS packages get decommissioned in phases.
Total: six months. Total cost for a 50-package estate at typical consultancy rates: $200,000 to $300,000.
What the Agentic Harness Pattern Changes
The harness compresses three of the four phases dramatically.
Discovery becomes minutes. Altimate Code’s harness can parse SSIS XML deterministically. The parser walks the package XML, extracts every dataflow, every source-destination mapping, every transformation step, and outputs a structured representation of the package’s logical content. There is no LLM in this layer. The output is correct by construction.
A 10-package estate gets parsed in well under an hour. The output is a machine-readable catalog of every transformation, ready for the translation phase.
Translation becomes prompt-driven. Given the structured representation from the parser, the LLM writes corresponding dbt models. The harness routes the LLM through schema introspection of the target warehouse (so the model’s column names match the actual Snowflake or Databricks tables), through dialect translation rules (T-SQL constructs converted to the target dialect using rule-based templates), and through anti-pattern detection (catching common translation errors before they ship).
The output is dbt models that compile, follow the standard staging-intermediate-marts layout, and include preliminary
not_nullanduniquetests on primary keys. Quality is good but not perfect. Edge cases require human review.Parity validation becomes deterministic. This is the phase that consumes the most consulting budget under the traditional pattern. The harness collapses it dramatically by running HashDiff or JoinDiff between the SSIS output and the new dbt output. Same data, row-for-row comparison via columnar hashing. Mismatches are flagged with the specific columns and rows that diverge.
The engineer then either fixes the dbt model (if the diff reveals a translation bug) or annotates the difference as expected (if the diff reflects an intentional modernization such as a timezone correction or dedup change).
Cutover remains a human decision. No harness automates the conversation with downstream stakeholders about cutover timing. This is unchanged.
A Weekend Walkthrough on a Real Package
Consider a representative 10-table SSIS package: a financial reporting dataflow. Source tables in SQL Server. Target tables in Snowflake. Five staging extracts, three intermediate transformations, two final aggregates feeding a Power BI dashboard. This is the shape of work that consumes a four-week senior engineer sprint under the traditional pattern.
Saturday morning (1 to 2 hours). Install Altimate Code via
npm install -g altimate-code. Point the harness at the SSIS package XML and the target Snowflake schema. The CLI command is roughlyaltimate-code migrate --source ssis://path/to/package.dtsx --target snowflake://schema --output dbt/. The harness parses the XML in seconds and emits a starter dbt project:dbt_project.yml, aprofiles.ymltemplate, source declarations undermodels/staging/sources.yml, five staging models, three intermediate models, two mart models, and an initialschema.ymlwith primary-key tests.Saturday afternoon (3 to 4 hours). Run
dbt buildagainst a dev Snowflake schema. Most models build clean. Two fail on column-name mismatches because the SSIS package referenced source columns by a logical name and the actual Snowflake table uses the physical name (cust_keyvscustomer_keyis the canonical example). One prompt resolves it: “Look up the actual column name for the customer key in the source table and update the staging models.” The agent invokes schema introspection, finds the actual name, updates the model, re-runs.Sunday morning (3 to 4 hours). Run parity validation.
altimate-code diff --left-source ssis-output --right-source dbt-output --keys order_id. The HashDiff report comes back: two marts match exactly, one mart has a 0.3 percent row difference. Drill in. The diff is due to a timezone conversion in the original SSIS package that the agent’s first translation missed because the SSIS XML did not explicitly mark it as a timezone operation. Fix by adding an explicitCONVERT_TIMEZONEcall to the intermediate model. Re-run the diff. It clears.Sunday afternoon (2 to 3 hours). Cleanup pass. The SSIS-generated column names are unreadable (
COL_1,Out_3). Prompt the agent to rename them based on source semantics. Generate dbt model documentation using the AI doc generator (Power User integrates this directly into VS Code). Add relationship tests between dim and fact models. Commit to git.Monday morning (2 hours). Open a pull request. Review with the original SSIS package author to confirm semantic equivalence. Address feedback. Merge. Cutover the Power BI dashboard data source from SSIS output to dbt output.
Total elapsed time: a weekend plus one morning. Total cost: one engineer’s time plus a few hundred dollars of LLM API spend. Replicate this pattern across 50 packages by parallelizing agent runs across two or three engineers and a 50-package estate completes in three to four weeks instead of six months.
What Still Requires Senior Judgment
The harness does not eliminate the need for senior engineering review. Three categories of work still require human-led decision-making.
Business logic that lives in stored procedures the agent cannot trace. If the SSIS dataflow calls a SQL Server stored procedure that does the real business calculation, the parser will identify the call but the dbt translation has to reimplement the stored procedure’s logic. The LLM can draft the reimplementation but a senior engineer should review it because business rules embedded in stored procedures often have historical context that is not obvious from the code.
T-SQL constructs without clean target-dialect equivalents. CROSS APPLY semantics, table-valued parameters, certain XML functions, MERGE syntax variants. The harness’s dialect translation rules cover the common cases. Edge cases get flagged for human review. A senior engineer decides whether the right translation is an in-line rewrite, a refactor to a different SQL shape, or a feature deferred to a later release.
Semantic layer translation. If the SSIS package was feeding a Cognos, Tableau, or Power BI model with nested business definitions (revenue recognition logic, customer segmentation rules, organizational hierarchies), the dbt rewrite needs to also produce a semantic layer (MetricFlow definitions, Cube models, or equivalent) that preserves the business meaning. The agent assists. The semantic layer design remains a human responsibility because it codifies business judgment that the SSIS XML does not capture.
Downstream coordination. Renaming a column requires updating every consumer: BI dashboards, downstream SQL Server processes, external feeds, regulatory reports. The agent does not know about those consumers unless explicitly briefed. Coordinate downstream changes manually.
The Economics in Detail
A 50-package SSIS estate represents the typical mid-market target. Traditional cost analysis: six months at $40,000 per month per senior consultant times two consultants (the standard staffing model) plus oversight and PM overhead = $200,000 to $300,000 total.
Agentic harness pattern cost analysis: three to four weeks of two senior internal engineers at internal cost rates of roughly $1,500 per day fully loaded = $45,000 to $60,000 total engineer cost. Plus Altimate Code Pro seats at $29 per seat per month, negligible at this scale. Plus LLM API costs at approximately $500 to $2,000 for the full project depending on package complexity and the model provider.
Net total: $50,000 to $65,000 versus $200,000 to $300,000. Cost reduction: 75 to 80 percent. Time reduction: 5x to 8x.
Quality outcomes are comparable or better because the parity validation is deterministic rather than sampled. Traditional consulting engagements typically validate by spot-checking key aggregates, which catches gross errors but misses subtle row-level discrepancies. HashDiff catches the row-level discrepancies that traditional validation misses.
The economic case is structural, not promotional. Engineering teams with significant SSIS estates should be running this analysis on their own books.
What This Means for the Legacy Data Modernization Market
The migration consulting market in data engineering has been one of the most profitable verticals in IT services for two decades. SSIS modernization specifically represents an estimated $2 to $4 billion annual market depending on how broadly “legacy ETL modernization” gets defined. The standard pricing model (multi-month engagements, senior-staffed teams, time-and-materials billing) is dependent on the labor-intensive nature of the work.
Agentic harness tooling reshapes the unit economics. Engagements that consumed four months of senior staff time can complete in three weeks with one to two engineers. The total addressable market does not shrink, but the revenue per engagement compresses substantially. Consultancies that adopt the harness pattern early will outbid traditional consultancies on the same work. Consultancies that do not adopt will lose deals on price.
For Fortune 500 customers, the practical effect is that SSIS modernization is no longer a board-level capital project. It is a quarterly engineering initiative. Backlogs that have sat for years because the migration estimate was too expensive are now economically tractable. Expect substantial throughput in this category through 2027 as customers realize the math has changed.
The Engineering Recommendation
For organizations with existing SSIS estates and an active modernization mandate, the right move is straightforward. Pull the latest Altimate Code. Pick one package, preferably a non-critical one for initial calibration. Run the parse-translate-validate loop end to end. Compare the time and cost to your incumbent consultant estimate.
Most teams complete this exercise within a week of senior engineer time and conclude that the harness pattern is the correct default for their full estate. The harder organizational question is then about consulting contract relationships, internal team capacity, and stakeholder coordination on cutover timing. None of those are technology problems.
The six-month consulting migration for SSIS estates is over. The new shape is a weekend project per package, with senior engineering review in the loop and deterministic parity validation at the end. Legacy ETL is finally getting unstuck.






