Stay in Sync: How to Detect Infrastructure Drift in Terraform Using CI/CD
Why Drift Detection Matters
Using Terraform to create cloud workloads has helped many teams transition away from a Click-Ops culture—where infrastructure is manually configured through cloud provider consoles and portals. Throughout the years, I’ve discovered that one the biggest challenges for organizations on their journey out of Click-Ops into automation is simply breaking teams out of the habit of doing things in the portal.
What often arises is a concept known as drift—when the actual state of infrastructure in the cloud no longer matches the desired state defined in Terraform.
Undetected drift can lead to:
Security risks - Unapproved changes can introduce vulnerabilities.
Compliance violations - When you operate in a highly regulated industry, compliance violations can be especially costly.
Developer confusion - “What happened to my change?”
There are a few ways to keep Terraform as the source of truth. The primary method is to set up Role Based Access Control (RBAC) policies to ensure that team members don’t have the access to do things in the portal that they once did. In practice, that’s hard to pull off, especially at the beginning of the journey - there may be valid use cases for people to have access to resources, particularly in lower environments.
Today we’re going to look at how we can use Github Workflows to help us out with this problem. We’ll set up a Workflow that notifies the team when drift is detected, which over time will help shift the team’s mentality away from Click-Ops and into making changes in the source code. We’ll be targeting Azure, but the approach is the same for AWS or GCP.
Detecting Drift with GitHub Actions
GitHub Actions provides a simple way to run Terraform drift detection on a schedule. Below is a workflow that runs terraform plan
in a repository and checks for differences.
name: "Detect Drift in Infrastructure"
on:
schedule:
- cron: "0 12 * * *"
workflow_dispatch:
permissions: write-all
jobs:
terraform-drift-check:
name: "Terraform Drift Detection"
runs-on: ubuntu-latest
environment: main
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.9.4
- name: Azure Login
uses: Azure/login@v1.5.0
with:
client-id: ${{ vars.ARM_CLIENT_ID }}
tenant-id: ${{ vars.ARM_TENANT_ID }}
subscription-id: ${{ vars.ARM_SUBSCRIPTION_ID }}
- name: Terraform Init
env:
ARM_CLIENT_ID: ${{ vars.ARM_CLIENT_ID }}
ARM_SUBSCRIPTION_ID: ${{ vars.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ vars.ARM_TENANT_ID }}
run: terraform init -backend-config="backend.conf"
- name: Terraform Plan
id: plan
env:
ARM_CLIENT_ID: ${{ vars.ARM_CLIENT_ID }}
ARM_SUBSCRIPTION_ID: ${{ vars.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ vars.ARM_TENANT_ID }}
TF_VAR_azure_subscription_id: ${{ vars.ARM_SUBSCRIPTION_ID }}
TF_VAR_azure_client_id: ${{ vars.ARM_CLIENT_ID }}
run: terraform plan -detailed-exit-code
- name: Notify Google Chat of Failure
uses: google-github-actions/send-google-chat-webhook@v0.0.2
if: steps.plan.outputs.exitcode == 2
with:
webhook_url: ${{ secrets.GCHAT_WEBHOOK_URL }}
mention: <users/devante@pick2solutions.com>
- name: Approve Terraform Plan
uses: trstringer/manual-approval@v1
if: steps.plan.outputs.exitcode == 2
with:
secret: ${{ github.token }}
approvers: dppick2solutions
- name: Terraform Apply
if: steps.plan.outputs.exitcode == 2
env:
ARM_CLIENT_ID: ${{ vars.ARM_CLIENT_ID }}
ARM_SUBSCRIPTION_ID: ${{ vars.ARM_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ vars.ARM_TENANT_ID }}
TF_VAR_azure_subscription_id: ${{ vars.ARM_SUBSCRIPTION_ID }}
TF_VAR_azure_client_id: ${{ vars.ARM_CLIENT_ID }}
run: terraform apply -auto-approve
When analyzing the workflow above, there are a few key details to note. First, the workflow is configured to run both on a schedule and manually using the workflow_dispatch trigger. In the terraform plan step, the -detailed-exit-code flag plays a crucial role—it allows Terraform to return specific exit codes based on the plan's outcome:
Exit Code 0: No changes were detected. The infrastructure matches the desired state.
Exit Code 1: An error occurred.
Exit Code 2: Changes were detected.
If Terraform returns an exit code of 2
, the workflow automatically alerts the team via a Google Chat webhook and triggers a manual approval process in GitHub to review and apply the necessary changes to correct the drift.
Let’s test it out!
First, we deploy our infrastructure with a GitHub Workflow run.
Then, we take a peek at Azure to make sure everything is there.
Next, we’ll modify the Azure Function app settings. Again, ideally RBAC should be in place to limit unauthorized changes, but in reality, there’s likely at least one environment where this isn’t enforced yet.
On it’s next scheduled run, our workflow detects that there was a manual change and sends us a message in our Google Workspace. If you’re curious about setting up the GCHAT webhook, check out https://github.com/google-github-actions/send-google-chat-webhook. There are similar Actions for MS Teams, Slack, etc.
Going into the workflow, we’re presented with the option to approve the run, resetting our Infrastructure back to what’s defined in our Terraform repo.
And that’s it! We’re back to our Infrastructure reflecting what’s in our code.
Conclusion
Automating drift detection helps enforce Terraform as the source of truth and reduces manual configuration errors. Whether using GitHub Actions, Azure Pipelines, Jenkins, or another CI/CD tool, adding a scheduled Terraform drift check is a great way to maintain infrastructure consistency and prevent unexpected changes.
By continuously monitoring your cloud environment, you ensure that Terraform remains in control and eliminate the risks associated with Click-Ops practices. Try setting up one of these pipelines today and take another step toward fully automated infrastructure management!
If you’re interested, the code for this demo is located here.
Need Help with Cloud, Automation, or Development?
At Pick 2 Solutions, we specialize in cloud infrastructure, automation, and software development. Whether you're looking to streamline your DevOps processes, improve your Terraform workflows, or build scalable applications, we’re here to help.
📩 Get in touch: info@pick2solutions.com
Let’s work together to optimize your cloud strategy! 🚀