Infrastructure as code: why you need it and how to start small

There is a failure mode that plays out quietly in organisations across South Africa every year. A cloud environment is built — initially by one engineer, or a small team, moving fast to get something live. Resources are created through the cloud console. Configuration is adjusted manually. Scripts accumulate in someone's local folder. Then the engineer leaves, or the team grows, or something breaks in production and nobody can reproduce the environment in which it last worked.

This is not a story about negligence. It is a story about the natural outcome of managing infrastructure the wrong way — treating it as something you click into existence rather than something you write, version, and deploy like code.

Infrastructure as code (IaC) is the practice of defining and managing your infrastructure through configuration files and code rather than manual processes or console interactions. It is one of the most high-leverage investments an engineering organisation can make, and it is accessible at any scale.

What infrastructure as code actually means

At its core, IaC means that every resource in your cloud environment — compute instances, databases, networking configuration, IAM roles, storage buckets, load balancers — is described in a file that can be stored in version control, reviewed in a pull request, and applied automatically through a deployment pipeline.

The most widely used tools are Terraform (cloud-agnostic, works with Azure, AWS, GCP, and most managed services), Bicep and ARM templates (Azure-native), CloudFormation (AWS-native), and Pulumi (which lets you write infrastructure in TypeScript, Python, or Go rather than a domain-specific language).

The choice of tool matters less than the discipline of using one consistently. What you are trying to achieve is a state where you can look at a git repository and understand, from that repository alone, what your infrastructure looks like — without having to log into a console and click around.

Why South African organisations need this now

The business case for IaC has always been strong. It becomes compelling specifically in the South African context for a few reasons.

The talent market makes institutional knowledge dangerous. South African engineering talent is globally competitive and globally mobile. When infrastructure knowledge lives in one person's head — or in undocumented manual configurations — a resignation becomes a risk event. IaC externalises that knowledge into version-controlled files that survive personnel changes.

Cloud cost accountability requires it. One of the most common causes of cloud overspend in South African organisations is infrastructure that was created for a purpose and never decommissioned — because nobody was sure what it was for, or whether it was still needed. When infrastructure is defined as code, you can see exactly what exists, why it exists (from commit history and comments), and remove it confidently when it is no longer needed.

Compliance and audit obligations are easier to meet. POPIA, FSCA requirements, and SOC 2 all require demonstrable controls over how data is stored and who can access it. When your IAM policies, encryption settings, and network controls are defined in code and reviewed in pull requests, you have an auditable record of every change. That record is genuinely difficult to produce when changes are made ad hoc through the console.

Disaster recovery becomes real rather than theoretical. Most organisations have a disaster recovery plan that assumes environments can be rebuilt. In practice, rebuilding a manually-configured environment is enormously time-consuming and error-prone. With IaC, rebuilding is running a deployment pipeline. The difference between a four-day recovery and a four-hour recovery is often whether you have your infrastructure defined as code.

The three things IaC actually gives you

It is worth being precise about what you are gaining, because the benefits are sometimes described in abstract terms that obscure what changes in practice.

Reproducibility. You can stand up an identical copy of any environment — production, staging, testing — from the same code. This means testing environment changes before they reach production, onboarding new developers without manual setup, and recovering from failure without archaeological digging.

Change history. Every change to infrastructure is a commit. You know what changed, when it changed, and who approved it. When something breaks, you are debugging against a clear record of recent changes rather than guessing.

Consistency across environments. The gap between development and production environments is one of the most common sources of bugs and incidents. When both are defined by the same code — with environment-specific variables substituted in — the gap shrinks substantially.

How to start when you have existing infrastructure

The hardest part of adopting IaC for most organisations is not learning the tools — it is the question of how to start when you already have infrastructure that was built manually. The answer is not to stop everything and rewrite it all, and it is not to give up because it seems too complex to import. The answer is to start with new infrastructure.

Pick the next piece of infrastructure you need to create — a new environment, a new service, a new database — and create it with Terraform (or your chosen tool) rather than through the console. Do this consistently for every new resource from that point forward. Over time, you will have an IaC-managed estate that grows relative to your manually-managed one, and you can import or replace the legacy resources incrementally.

For teams on Azure, Bicep is worth considering as the starting tool because Microsoft has invested heavily in first-party support and the documentation is comprehensive. For teams that need to manage resources across Azure, AWS, and GCP — which is increasingly common as organisations use best-of-breed services — Terraform gives you a single language and workflow across all three.

A practical starting configuration for South African teams

A minimal, useful IaC setup for a small South African engineering team typically needs to do four things: manage compute resources (Cloud Run, Azure Container Apps, EC2, or whatever you are running on), manage databases (connection details, backup policies, access controls), manage IAM (who can access what, with least-privilege policies), and manage networking (VPCs, subnets, firewall rules).

For a team starting from scratch with GCP — which is increasingly common given the growth of Google Cloud infrastructure in Africa — a basic Terraform configuration looks like this:

# Provider configuration
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
  # Remote state — critical for team use
  backend "gcs" {
    bucket = "your-org-terraform-state"
    prefix = "environments/production"
  }
}

provider "google" {
  project = var.project_id
  region  = "africa-south1"  # Johannesburg
}

# Cloud Run service
resource "google_cloud_run_v2_service" "app" {
  name     = var.service_name
  location = "africa-south1"

  template {
    containers {
      image = var.container_image
      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
      }
    }
    scaling {
      min_instance_count = 1
      max_instance_count = 10
    }
  }
}

# Allow public access
resource "google_cloud_run_service_iam_member" "public" {
  service  = google_cloud_run_v2_service.app.name
  location = google_cloud_run_v2_service.app.location
  role     = "roles/run.invoker"
  member   = "allUsers"
}

This is not a complete production configuration — it is illustrative of the pattern. The point is that this file, stored in version control, represents infrastructure that can be applied, reviewed, and replicated. The resource named google_cloud_run_v2_service.app is unambiguous. The scaling configuration is auditable. The IAM assignment is explicit.

State management — the thing teams get wrong first

One aspect of IaC that catches teams unprepared is state management. Terraform maintains a state file that tracks the relationship between your configuration and the actual resources in your cloud account. If this state file is stored locally — which it is by default — it creates problems the moment more than one person works on the infrastructure.

Remote state backends solve this: the state file is stored in a shared location (a GCS bucket on GCP, an S3 bucket on AWS, an Azure Blob Storage account) and locked during operations so that concurrent changes do not corrupt it. Setting up remote state should be one of the first things you do, even if you are the only person working on the infrastructure today. The cost of setting it up is low. The cost of not having it when you need it is high.

What a mature IaC practice looks like

For organisations further along in the journey, the infrastructure as code practice extends beyond resource creation into policy enforcement. Tools like Sentinel (Terraform Enterprise), Checkov, and tfsec allow you to define rules — no public storage buckets, all databases must have encryption enabled, all resources must have cost allocation tags — and enforce them automatically before changes are applied.

This is where IaC becomes a compliance and security tool as well as an operational one. When your policies are code, they are consistently enforced. There is no path to creating a misconfigured resource because the pipeline will not allow it.

CloudNala designs and implements infrastructure as code practices for South African engineering organisations — from initial Terraform adoption to fully automated, policy-enforced deployment pipelines. Talk to us about where to start.