This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Pipeline and Infrastructure

Anti-patterns in build pipelines, deployment automation, and infrastructure management that block continuous delivery.

1: No Pipeline Exists
2: Manual Deployments
3: Snowflake Environments

These anti-patterns affect the automated path from commit to production. They create manual steps, slow feedback, and fragile deployments that prevent the reliable, repeatable delivery that continuous delivery requires.

1 - No Pipeline Exists

Builds and deployments are manual processes. Someone runs a script on their laptop. There is no automated path from commit to production.

Category: Pipeline & Infrastructure | Quality Impact: Critical

What This Looks Like

Deploying to production requires a person. Someone opens a terminal, SSHs into a server, pulls the latest code, runs a build command, and restarts a service. Or they download an artifact from a shared drive, copy it to the right server, and run an install script. The steps live in a wiki page, a shared document, or in someone’s head. Every deployment is a manual operation performed by whoever knows the procedure.

There is no automation connecting a code commit to a running system. A developer finishes a feature, pushes to the repository, and then a separate human process begins: someone must decide it is time to deploy, gather the right artifacts, prepare the target environment, execute the deployment, and verify that it worked. Each of these steps involves manual effort and human judgment.

The deployment procedure is a craft. Certain people are known for being “good at deploys.” New team members are warned not to attempt deployments alone. When the person who knows the procedure is unavailable, deployments wait. The team has learned to treat deployment as a risky, specialized activity that requires care and experience.

Common variations:

The deploy script on someone’s laptop. A shell script that automates some steps, but it lives on one developer’s machine. Nobody else has it. When that developer is out, the team either waits or reverse-engineers the procedure from the wiki.
The manual checklist. A document with 30 steps: “SSH into server X, run this command, check this log file, restart this service.” The checklist is usually out of date. Steps are missing or in the wrong order. The person deploying adds corrections in the margins.
The “only Dave can deploy” pattern. One person has the credentials, the knowledge, and the muscle memory to deploy reliably. Deployments are scheduled around Dave’s availability. Dave is a single point of failure and cannot take vacation during release weeks.
The FTP deployment. Build artifacts are uploaded to a server via FTP, SCP, or a file share. The person deploying must know which files go where, which config files to update, and which services to restart. A missed file means a broken deployment.
The manual build. There is no automated build at all. A developer runs the build command locally, checks that it compiles, and copies the output to the deployment target. The build that was tested is not necessarily the build that gets deployed.

The telltale sign: if deploying requires a specific person, a specific machine, or a specific document that must be followed step by step, no pipeline exists.

Why This Is a Problem

The absence of a pipeline means every deployment is a unique event. No two deployments are identical because human hands are involved in every step. This creates risk, waste, and unpredictability that compound with every release.

It reduces quality

Without a pipeline, there is no enforced quality gate between a developer’s commit and production. Tests may or may not be run before deploying. Static analysis may or may not be checked. The artifact that reaches production may or may not be the same artifact that was tested. Every “may or may not” is a gap where defects slip through.

Manual deployments also introduce their own defects. A step skipped in the checklist, a wrong version of a config file, a service restarted in the wrong order - these are deployment bugs that have nothing to do with the code. They are caused by the deployment process itself. The more manual steps involved, the more opportunities for human error.

A pipeline eliminates both categories of risk. Every commit passes through the same automated checks. The artifact that is tested is the artifact that is deployed. There are no skipped steps because the steps are encoded in the pipeline definition and execute the same way every time.

It increases rework

Manual deployments are slow, so teams batch changes to reduce deployment frequency. Batching means more changes per deployment. More changes means harder debugging when something goes wrong, because any of dozens of commits could be the cause. The team spends hours bisecting changes to find the one that broke production.

Failed manual deployments create their own rework. A deployment that goes wrong must be diagnosed, rolled back (if rollback is even possible), and re-attempted. Each re-attempt burns time and attention. If the deployment corrupted data or left the system in a partial state, the recovery effort dwarfs the original deployment.

Rework also accumulates in the deployment procedure itself. Every deployment surfaces a new edge case or a new prerequisite that was not in the checklist. Someone updates the wiki. The next deployer reads the old version. The procedure is never quite right because manual procedures cannot be versioned, tested, or reviewed the way code can.

With an automated pipeline, deployments are fast and repeatable. Small changes deploy individually. Failed deployments are rolled back automatically. The pipeline definition is code - versioned, reviewed, and tested like any other part of the system.

It makes delivery timelines unpredictable

A manual deployment takes an unpredictable amount of time. The optimistic case is 30 minutes. The realistic case includes troubleshooting unexpected errors, waiting for the right person to be available, and re-running steps that failed. A “quick deploy” can easily consume half a day.

The team cannot commit to release dates because the deployment itself is a variable. “We can deploy on Tuesday” becomes “we can start the deployment on Tuesday, and we’ll know by Wednesday whether it worked.” Stakeholders learn that deployment dates are approximate, not firm.

The unpredictability also limits deployment frequency. If each deployment takes hours of manual effort and carries risk of failure, the team deploys as infrequently as possible. This increases batch size, which increases risk, which makes deployments even more painful, which further discourages frequent deployment. The team is trapped in a cycle where the lack of a pipeline makes deployments costly, and costly deployments make the lack of a pipeline seem acceptable.

An automated pipeline makes deployment duration fixed and predictable. A deploy takes the same amount of time whether it happens once a month or ten times a day. The cost per deployment drops to near zero, removing the incentive to batch.

It concentrates knowledge in too few people

When deployment is manual, the knowledge of how to deploy lives in people rather than in code. The team depends on specific individuals who know the servers, the credentials, the order of operations, and the workarounds for known issues. These individuals become bottlenecks and single points of failure.

When the deployment expert is unavailable - sick, on vacation, or has left the company - the team is stuck. Someone else must reconstruct the deployment procedure from incomplete documentation and trial and error. Deployments attempted by inexperienced team members fail at higher rates, which reinforces the belief that only experts should deploy.

A pipeline encodes deployment knowledge in an executable definition that anyone can run. New team members deploy on their first day by triggering the pipeline. The deployment expert’s knowledge is preserved in code rather than in their head. The bus factor for deployments moves from one to the entire team.

Impact on continuous delivery

Continuous delivery requires an automated, repeatable pipeline that can take any commit from trunk and deliver it to production with confidence. Without a pipeline, none of this is possible. There is no automation to repeat. There is no confidence that the process will work the same way twice. There is no path from commit to production that does not require a human to drive it.

The pipeline is not an optimization of manual deployment. It is a prerequisite for CD. A team without a pipeline cannot practice CD any more than a team without source control can practice version management. The pipeline is the foundation. Everything else - automated testing, deployment strategies, progressive rollouts, fast rollback - depends on it existing.

How to Fix It

Step 1: Document the current manual process exactly (Week 1)

Before automating, capture what the team actually does today. Have the person who deploys most often write down every step in order:

What commands do they run?
What servers do they connect to?
What credentials do they use?
What checks do they perform before, during, and after?
What do they do when something goes wrong?

This document is not the solution - it is the specification for the first version of the pipeline. Every manual step will become an automated step.

Step 2: Automate the build (Week 2)

Start with the simplest piece: turning source code into a deployable artifact without manual intervention.

Choose a CI server (Jenkins, GitHub Actions, GitLab CI, CircleCI, or any tool that triggers on commit).
Configure it to check out the code and run the build command on every push to trunk.
Store the build output as a versioned artifact.

At this point, the team has an automated build but still deploys manually. That is fine. The pipeline will grow incrementally.

Step 3: Add automated tests to the build (Week 3)

If the team has any automated tests, add them to the pipeline so they run after the build succeeds. If the team has no automated tests, add one. A single test that verifies the application starts up is more valuable than zero tests.

The pipeline should now fail if the build fails or if any test fails. This is the first automated quality gate. No artifact is produced unless the code compiles and the tests pass.

Step 4: Automate the deployment to a non-production environment (Weeks 3-4)

Take the manual deployment steps from Step 1 and encode them in a script or pipeline stage that deploys the tested artifact to a staging or test environment:

Provision or configure the target environment.
Deploy the artifact.
Run a smoke test to verify the deployment succeeded.

The team now has a pipeline that builds, tests, and deploys to a non-production environment on every commit. Deployments to this environment should happen without any human intervention.

Step 5: Extend the pipeline to production (Weeks 5-6)

Once the team trusts the automated deployment to non-production environments, extend it to production:

Add a manual approval gate if the team is not yet comfortable with fully automated production deployments. This is a temporary step - the goal is to remove it later.
Use the same deployment script and process for production that you use for non-production. The only difference should be the target environment and its configuration.
Add post-deployment verification: health checks, smoke tests, or basic monitoring checks that confirm the deployment is healthy.

The first automated production deployment will be nerve-wracking. That is normal. Run it alongside the manual process the first few times: deploy automatically, then verify manually. As confidence grows, drop the manual verification.

Step 6: Address the objections (Ongoing)

Objection	Response
“Our deployments are too complex to automate”	If a human can follow the steps, a script can execute them. Complex deployments benefit the most from automation because they have the most opportunities for human error.
“We don’t have time to build a pipeline”	You are already spending time on every manual deployment. A pipeline is an investment that pays back on the second deployment and every deployment after.
“Only Dave knows how to deploy”	That is the problem, not a reason to keep the status quo. Building the pipeline captures Dave’s knowledge in code. Dave should lead the pipeline effort because he knows the procedure best.
“What if the pipeline deploys something broken?”	The pipeline includes automated tests and can include approval gates. A broken deployment from a pipeline is no worse than a broken deployment from a human - and the pipeline can roll back automatically.
“Our infrastructure doesn’t support modern CI/CD tools”	Start with a shell script triggered by a cron job or a webhook. A pipeline does not require Kubernetes or cloud-native infrastructure. It requires automation of the steps you already perform manually.

Measuring Progress

Metric	What to look for
Manual steps in the deployment process	Should decrease to zero
Deployment duration	Should decrease and stabilize as manual steps are automated
Release frequency	Should increase as deployment cost drops
Deployment failure rate	Should decrease as human error is removed
People who can deploy to production	Should increase from one or two to the entire team
Lead time	Should decrease as the manual deployment bottleneck is eliminated

Build Automation - The first step in building a pipeline
Pipeline Architecture - How to structure a pipeline from commit to production
Single Path to Production - Every change follows the same automated path
Everything as Code - Pipeline definitions, infrastructure, and deployment procedures belong in version control
Identify Constraints - The absence of a pipeline is often the primary constraint on delivery

2 - Manual Deployments

The build is automated but deployment is not. Someone must SSH into servers, run scripts, and shepherd each release to production by hand.

Category: Pipeline & Infrastructure | Quality Impact: High

What This Looks Like

The team has a CI server. Code is built and tested automatically on every push. The pipeline dashboard is green. But between “pipeline passed” and “code running in production,” there is a person. Someone must log into a deployment tool, click a button, select the right artifact, choose the right environment, and watch the output scroll by. Or they SSH into servers, pull the artifact, run migration scripts, restart services, and verify health checks - all by hand.

The team may not even think of this as a problem. The build is automated. The tests run automatically. Deployment is “just the last step.” But that last step takes 30 minutes to an hour of focused human attention, can only happen when the right person is available, and fails often enough that nobody wants to do it on a Friday afternoon.

Deployment has its own rituals. The team announces in Slack that a deploy is starting. Other developers stop merging. Someone watches the logs. Another person checks the monitoring dashboard. When it is done, someone posts a confirmation. The whole team holds its breath during the process and exhales when it works. This ceremony happens every time, whether the release is one commit or fifty.

Common variations:

The button-click deploy. The CI/CD tool has a “deploy to production” button, but a human must click it and then monitor the result. The automation exists but is not trusted to run unattended. Someone watches every deployment from start to finish.
The runbook deploy. A document describes the deployment steps in order. The deployer follows the runbook, executing commands manually at each step. The runbook was written months ago and has handwritten corrections in the margins. Some steps have been added, others crossed out.
The SSH-and-pray deploy. The deployer SSHs into each server individually, pulls code or copies artifacts, runs scripts, and restarts services. The order matters. Missing a server means a partial deployment. The deployer keeps a mental checklist of which servers are done.
The release coordinator deploy. One person coordinates the deployment across multiple systems. They send messages to different teams: “deploy service A now,” “run the database migration,” “restart the cache.” The deployment is a choreographed multi-person event.
The after-hours deploy. Deployments happen only outside business hours because the manual process is risky enough that the team wants minimal user traffic. Deployers work evenings or weekends. The deployment window is sacred and stressful.

The telltale sign: if the pipeline is green but the team still needs to “do a deploy” as a separate activity, deployment is manual.

Why This Is a Problem

A manual deployment negates much of the value that an automated build and test pipeline provides. The pipeline can validate code in minutes, but if the last mile to production requires a human, the delivery speed is limited by that human’s availability, attention, and reliability.

It reduces quality

Manual deployment introduces a category of defects that have nothing to do with the code. A deployer who runs migration scripts in the wrong order corrupts data. A deployer who forgets to update a config file on one of four servers creates inconsistent behavior. A deployer who restarts services too quickly triggers a cascade of connection errors. These are process defects - bugs introduced by the deployment method, not the software.

Manual deployments also degrade the quality signal from the pipeline. The pipeline tests a specific artifact in a specific configuration. If the deployer manually adjusts configuration, selects a different artifact version, or skips a verification step, the deployed system no longer matches what the pipeline validated. The pipeline said “this is safe to deploy,” but what actually reached production is something slightly different.

Automated deployment eliminates process defects by executing the same steps in the same order every time. The artifact the pipeline tested is the artifact that reaches production. Configuration is applied from version-controlled definitions, not from human memory. The deployment is identical whether it happens at 2 PM on Tuesday or 3 AM on Saturday.

It increases rework

Because manual deployments are slow and risky, teams batch changes. Instead of deploying each commit individually, they accumulate a week or two of changes and deploy them together. When something breaks in production, the team must determine which of thirty commits caused the problem. This diagnosis takes hours. The fix takes more hours. If the fix itself requires a deployment, the team must go through the manual process again.

Failed deployments are especially costly. A manual deployment that leaves the system in a broken state requires manual recovery. The deployer must diagnose what went wrong, decide whether to roll forward or roll back, and execute the recovery steps by hand. If the deployment was a multi-server process and some servers are on the new version while others are on the old version, the recovery is even harder. The team may spend more time recovering from a failed deployment than they spent on the deployment itself.

With automated deployments, each commit deploys individually. When something breaks, the cause is obvious - it is the one commit that just deployed. Rollback is a single action, not a manual recovery effort. The time from “something is wrong” to “the previous version is running” is minutes, not hours.

It makes delivery timelines unpredictable

The gap between “pipeline is green” and “code is in production” is measured in human availability. If the deployer is in a meeting, the deployment waits. If the deployer is on vacation, the deployment waits longer. If the deployment fails and the deployer needs help, the recovery depends on who else is around.

This human dependency makes release timing unpredictable. The team cannot promise “this fix will be in production in 30 minutes” because the deployment requires a person who may not be available for hours. Urgent fixes wait for deployment windows. Critical patches wait for the release coordinator to finish lunch.

The batching effect adds another layer of unpredictability. When teams batch changes to reduce deployment frequency, each deployment becomes larger and riskier. Larger deployments take longer to verify and are more likely to fail. The team cannot predict how long the deployment will take because they cannot predict what will go wrong with a batch of thirty changes.

Automated deployment makes the time from “pipeline green” to “running in production” fixed and predictable. It takes the same number of minutes regardless of who is available, what day it is, or how many other things are happening. The team can promise delivery timelines because the deployment is a deterministic process, not a human activity.

It prevents fast recovery

When production breaks, speed of recovery determines the blast radius. A team that can deploy a fix in five minutes limits the damage. A team that needs 45 minutes of manual deployment work exposes users to the problem for 45 minutes plus diagnosis time.

Manual rollback is even worse. Many teams with manual deployments have no practiced rollback procedure at all. “Rollback” means “re-deploy the previous version,” which means running the entire manual deployment process again with a different artifact. If the deployment process takes an hour, rollback takes an hour. If the deployment process requires a specific person, rollback requires that same person.

Some manual deployments cannot be cleanly rolled back. Database migrations that ran during the deployment may not have reverse scripts. Config changes applied to servers may not have been tracked. The team is left doing a forward fix under pressure, manually deploying a patch through the same slow process that caused the problem.

Automated pipelines with automated rollback can revert to the previous version in minutes. The rollback follows the same tested path as the deployment. No human judgment is required. The team’s mean time to repair drops from hours to minutes.

Impact on continuous delivery

Continuous delivery means any commit that passes the pipeline can be released to production at any time with confidence. Manual deployment breaks this definition at “at any time.” The commit can only be released when a human is available to perform the deployment, when the deployment window is open, and when the team is ready to dedicate attention to watching the process.

The manual deployment step is the bottleneck that limits everything upstream. The pipeline can validate commits in 10 minutes, but if deployment takes an hour of human effort, the team will never deploy more than a few times per day at best. In practice, teams with manual deployments release weekly or biweekly because the deployment overhead makes anything more frequent impractical.

The pipeline is only half the delivery system. Automating the build and tests without automating the deployment is like paving a highway that ends in a dirt road. The speed of the paved section is irrelevant if every journey ends with a slow, bumpy last mile.

How to Fix It

Step 1: Script the current manual process (Week 1)

Take the runbook, the checklist, or the knowledge in the deployer’s head and turn it into a script. Do not redesign the process yet - just encode what the team already does.

Record a deployment from start to finish. Note every command, every server, every check.
Write a script that executes those steps in order.
Store the script in version control alongside the application code.

The script will be rough. It will have hardcoded values and assumptions. That is fine. The goal is to make the deployment reproducible by any team member, not to make it perfect.

Step 2: Run the script from the pipeline (Week 2)

Connect the deployment script to the CI/CD pipeline so it runs automatically after the build and tests pass. Start with a non-production environment:

Add a deployment stage to the pipeline that targets a staging or test environment.
Trigger it automatically on every successful build.
Add a smoke test after deployment to verify it worked.

The team now gets automatic deployments to a non-production environment on every commit. This builds confidence in the automation and surfaces problems early.

Step 3: Externalize configuration and secrets (Weeks 2-3)

Manual deployments often involve editing config files on servers or passing environment-specific values by hand. Move these out of the manual process:

Store environment-specific configuration in a config management system or environment variables managed by the pipeline.
Move secrets to a secrets manager (Vault, AWS Secrets Manager, Azure Key Vault, or even encrypted pipeline variables as a starting point).
Ensure the deployment script reads configuration from these sources rather than from hardcoded values or manual input.

This step is critical because manual configuration is one of the most common sources of deployment failures. Automating deployment without automating configuration just moves the manual step.

Step 4: Automate production deployment with a gate (Weeks 3-4)

Extend the pipeline to deploy to production using the same script and process:

Add a production deployment stage after the non-production deployment succeeds.
Include a manual approval gate - a button that a team member clicks to authorize the production deployment. This is a temporary safety net while the team builds confidence.
Add post-deployment health checks that automatically verify the deployment succeeded.
Add automated rollback that triggers if the health checks fail.

The approval gate means a human still decides when to deploy, but the deployment itself is fully automated. No SSHing. No manual steps. No watching logs scroll by.

Step 5: Remove the manual gate (Weeks 6-8)

Once the team has seen the automated production deployment succeed repeatedly, remove the manual approval gate. The pipeline now deploys to production automatically when all checks pass.

This is the hardest step emotionally. The team will resist. Expect these objections:

Objection	Response
“We need a human to decide when to deploy”	Why? If the pipeline validates the code and the deployment process is automated and tested, what decision is the human making? If the answer is “checking that nothing looks weird,” that check should be automated.
“What if it deploys during peak traffic?”	Use deployment windows in the pipeline configuration, or use progressive rollout strategies that limit blast radius regardless of traffic.
“We had a bad deployment last month”	Was it caused by the automation or by a gap in testing? If the tests missed a defect, the fix is better tests, not a manual gate. If the deployment process itself failed, the fix is better deployment automation, not a human watching.
“Compliance requires manual approval”	Review the actual compliance requirement. Most require evidence of approval, not a human clicking a button at deployment time. A code review approval, an automated policy check, or an audit log of the pipeline run often satisfies the requirement.
“Our deployments require coordination with other teams”	Automate the coordination. Use API contracts, deployment dependencies in the pipeline, or event-based triggers. If another team must deploy first, encode that dependency rather than coordinating in Slack.

Step 6: Add deployment observability (Ongoing)

Once deployments are automated, invest in knowing whether they worked:

Monitor error rates, latency, and key business metrics after every deployment.
Set up automatic rollback triggers tied to these metrics.
Track deployment frequency, duration, and failure rate over time.

The team should be able to deploy without watching. The monitoring watches for them.

Measuring Progress

Metric	What to look for
Manual steps per deployment	Should reach zero
Deployment duration (human time)	Should drop from hours to zero - the pipeline does the work
Release frequency	Should increase as deployment friction drops
Change fail rate	Should decrease as manual process defects are eliminated
Mean time to repair	Should decrease as rollback becomes automated
Lead time	Should decrease as the deployment bottleneck is removed

Pipeline Architecture - How to structure a pipeline that includes deployment
Single Path to Production - Every change follows the same automated path through the same pipeline
Rollback - Automated rollback depends on automated deployment
Everything as Code - Deployment scripts, configuration, and infrastructure belong in version control
No Pipeline Exists - If the build is also manual, start there first

3 - Snowflake Environments

Each environment is hand-configured and unique. Nobody knows exactly what is running where. Configuration drift is constant.

Category: Pipeline & Infrastructure | Quality Impact: High

What This Looks Like

Staging has a different version of the database than production. The dev environment has a library installed that nobody remembers adding. Production has a configuration file that was edited by hand six months ago during an incident and never committed to source control. Nobody is sure all three environments are running the same OS patch level.

A developer asks “why does this work in staging but not in production?” The answer takes hours to find because it requires comparing configurations across environments by hand - diffing config files, checking installed packages, verifying environment variables one by one.

Common variations:

The hand-built server. Someone provisioned the production server two years ago. They followed a wiki page that has since been edited, moved, or deleted. Nobody has provisioned a new one since. If the server dies, nobody is confident they can recreate it.
The magic SSH session. During an incident, someone SSH-ed into production and changed a config value. It fixed the problem. Nobody updated the deployment scripts, the infrastructure code, or the documentation. The next deployment overwrites the fix - or doesn’t, depending on which files the deployment touches.
The shared dev environment. A single development or staging environment is shared by the whole team. One developer installs a library, another changes a config value, a third adds a cron job. The environment drifts from any known baseline within weeks.
The “production is special” mindset. Dev and staging environments are provisioned with scripts, but production was set up differently because of “security requirements” or “scale differences.” The result is that the environments the team tests against are structurally different from the one that serves users.
The environment with a name. Environments have names like “staging-v2” or “qa-new” because someone created a new one alongside the old one. Both still exist. Nobody is sure which one the pipeline deploys to.

The telltale sign: deploying the same artifact to two environments produces different results, and the team’s first instinct is to check environment configuration rather than application code.

Why This Is a Problem

Snowflake environments undermine the fundamental premise of testing: that the behavior you observe in one environment predicts the behavior you will see in another. When every environment is unique, testing in staging tells you what works in staging - nothing more.

It reduces quality

When environments differ, bugs hide in the gaps. An application that works in staging may fail in production because of a different library version, a missing environment variable, or a filesystem permission that was set by hand. These bugs are invisible to testing because the test environment does not reproduce the conditions that trigger them.

The team learns this the hard way, one production incident at a time. Each incident teaches the team that “passed in staging” does not mean “will work in production.” This erodes trust in the entire testing and deployment process. Developers start adding manual verification steps - checking production configs by hand before deploying, running smoke tests manually after deployment, asking the ops team to “keep an eye on things.”

When environments are identical and provisioned from the same code, the gap between staging and production disappears. What works in staging works in production because the environments are the same. Testing produces reliable results.

It increases rework

Snowflake environments cause two categories of rework. First, developers spend hours debugging environment-specific issues that have nothing to do with application code. “Why does this work on my machine but not in CI?” leads to comparing configurations, googling error messages related to version mismatches, and patching environments by hand. This time is pure waste.

Second, production incidents caused by environment drift require investigation, rollback, and fixes to both the application and the environment. A configuration difference that causes a production failure might take five minutes to fix once identified, but identifying it takes hours because nobody knows what the correct configuration should be.

Teams with reproducible environments spend zero time on environment debugging. If an environment is wrong, they destroy it and recreate it from code. The investigation time drops from hours to minutes.

It makes delivery timelines unpredictable

Deploying to a snowflake environment is unpredictable because the environment itself is an unknown variable. The same deployment might succeed on Monday and fail on Friday because someone changed something in the environment between the two deploys. The team cannot predict how long a deployment will take because they cannot predict what environment issues they will encounter.

This unpredictability compounds across environments. A change must pass through dev, staging, and production, and each environment is a unique snowflake with its own potential for surprise. A deployment that should take minutes takes hours because each environment reveals a new configuration issue.

Reproducible environments make deployment time a constant. The same artifact deployed to the same environment specification produces the same result every time. Deployment becomes a predictable step in the pipeline rather than an adventure.

It makes environments a scarce resource

When environments are hand-configured, creating a new one is expensive. It takes hours or days of manual work. The team has a small number of shared environments and must coordinate access. “Can I use staging today?” becomes a daily question. Teams queue up for access to the one environment that resembles production.

This scarcity blocks parallel work. Two developers who both need to test a database migration cannot do so simultaneously if there is only one staging environment. One waits while the other finishes. Features that could be validated in parallel are serialized through a shared environment bottleneck.

When environments are defined as code, spinning up a new one is a pipeline step that takes minutes. Each developer or feature branch can have its own environment. There is no contention because environments are disposable and cheap.

Impact on continuous delivery

Continuous delivery requires that any change can move from commit to production through a fully automated pipeline. Snowflake environments break this in multiple ways. The pipeline cannot provision environments automatically if environments are hand-configured. Testing results are unreliable because environments differ. Deployments fail unpredictably because of configuration drift.

A team with snowflake environments cannot trust their pipeline. They cannot deploy frequently because each deployment risks hitting an environment-specific issue. They cannot automate fully because the environments require manual intervention. The path from commit to production is neither continuous nor reliable.

How to Fix It

Step 1: Document what exists today (Week 1)

Before automating anything, capture the current state of each environment:

For each environment (dev, staging, production), record: OS version, installed packages, configuration files, environment variables, external service connections, and any manual customizations.
Diff the environments against each other. Note every difference.
Classify each difference as intentional (e.g., production uses a larger instance size) or accidental (e.g., staging has an old library version nobody updated).

This audit surfaces the drift. Most teams are surprised by how many accidental differences exist.

Step 2: Define one environment specification (Weeks 2-3)

Choose an infrastructure-as-code tool (Terraform, Pulumi, CloudFormation, Ansible, or similar) and write a specification for one environment. Start with the environment you understand best - usually staging.

The specification should define:

Base infrastructure (servers, containers, networking)
Installed packages and their versions
Configuration files and their contents
Environment variables with placeholder values
Any scripts that run at provisioning time

Verify the specification by destroying the staging environment and recreating it from code. If the recreated environment works, the specification is correct. If it does not, fix the specification until it does.

Step 3: Parameterize for environment differences (Week 3)

Intentional differences between environments (instance sizes, database connection strings, API keys) become parameters, not separate specifications. One specification with environment-specific variables:

Parameter	Dev	Staging	Production
Instance size	small	medium	large
Database host	dev-db.internal	staging-db.internal	prod-db.internal
Log level	debug	info	warn
Replica count	1	2	3

The structure is identical. Only the values change. This eliminates accidental drift because every environment is built from the same template.

Step 4: Provision environments through the pipeline (Week 4)

Add environment provisioning to the deployment pipeline:

Before deploying to an environment, the pipeline provisions (or updates) it from the infrastructure code.
The application artifact is deployed to the freshly provisioned environment.
If provisioning or deployment fails, the pipeline fails - no manual intervention.

This closes the loop. Environments cannot drift because they are recreated or reconciled on every deployment. Manual SSH sessions and hand edits have no lasting effect because the next pipeline run overwrites them.

Step 5: Make environments disposable (Week 5+)

The ultimate goal is that any environment can be destroyed and recreated in minutes with no data loss and no human intervention:

Practice destroying and recreating staging weekly. This verifies the specification stays accurate and builds team confidence.
Provision ephemeral environments for feature branches or pull requests. Let the pipeline create and destroy them automatically.
If recreating production is not feasible yet (stateful systems, licensing), ensure you can provision a production-identical environment for testing at any time.

Objection	Response
“Production has unique requirements we can’t codify”	If a requirement exists only in production and is not captured in code, it is at risk of being lost. Codify it. If it is truly unique, it belongs in a parameter, not a hand-edit.
“We don’t have time to learn infrastructure-as-code”	You are already spending that time debugging environment drift. The investment pays for itself within weeks. Start with the simplest tool that works for your platform.
“Our environments are managed by another team”	Work with them. Provide the specification. If they provision from your code, you both benefit: they have a reproducible process and you have predictable environments.
“Containers solve this problem”	Containers solve application-level consistency. You still need infrastructure-as-code for the platform the containers run on - networking, storage, secrets, load balancers. Containers are part of the solution, not the whole solution.

Measuring Progress

Metric	What to look for
Environment provisioning time	Should decrease from hours/days to minutes
Configuration differences between environments	Should reach zero accidental differences
“Works in staging but not production” incidents	Should drop to near zero
Change fail rate	Should decrease as environment parity improves
Mean time to repair	Should decrease as environments become reproducible
Time spent debugging environment issues	Track informally - should approach zero

Everything as Code - Infrastructure, configuration, and environments defined in source control
Production-Like Environments - Ensuring test environments match production
Pipeline Architecture - How environments fit into the deployment pipeline
No Pipeline Exists - Snowflake environments often coexist with manual deployment processes
Deterministic Pipeline - A pipeline that gives the same answer every time requires identical environments

Pipeline and Infrastructure

1 - No Pipeline Exists

What This Looks Like

Why This Is a Problem

It reduces quality

It increases rework

It makes delivery timelines unpredictable

It concentrates knowledge in too few people

Impact on continuous delivery

How to Fix It

Step 1: Document the current manual process exactly (Week 1)

Step 2: Automate the build (Week 2)

Step 3: Add automated tests to the build (Week 3)

Step 4: Automate the deployment to a non-production environment (Weeks 3-4)

Step 5: Extend the pipeline to production (Weeks 5-6)

Step 6: Address the objections (Ongoing)

Measuring Progress

Related Content

2 - Manual Deployments

What This Looks Like

Why This Is a Problem

It reduces quality

It increases rework

It makes delivery timelines unpredictable

It prevents fast recovery

Impact on continuous delivery

How to Fix It

Step 1: Script the current manual process (Week 1)

Step 2: Run the script from the pipeline (Week 2)

Step 3: Externalize configuration and secrets (Weeks 2-3)

Step 4: Automate production deployment with a gate (Weeks 3-4)

Step 5: Remove the manual gate (Weeks 6-8)

Step 6: Add deployment observability (Ongoing)

Measuring Progress

Related Content

3 - Snowflake Environments

What This Looks Like

Why This Is a Problem

It reduces quality

It increases rework

It makes delivery timelines unpredictable

It makes environments a scarce resource

Impact on continuous delivery

How to Fix It

Step 1: Document what exists today (Week 1)

Step 2: Define one environment specification (Weeks 2-3)

Step 3: Parameterize for environment differences (Week 3)

Step 4: Provision environments through the pipeline (Week 4)

Step 5: Make environments disposable (Week 5+)

Measuring Progress

Related Content