Infrastructure as Code Fundamentals

2025.07.14 DEVELOPMENT

Infrastructure as Code Fundamentals

Manual infrastructure provisioning doesn't scale and creates drift. Infrastructure as Code (IaC) makes your infrastructure reproducible, version-controlled, and self-documenting.

Why IaC?

Reproducibility: Same code = same infrastructure
Version control: Track and audit changes
Collaboration: Review infrastructure changes like code
Documentation: Code documents what exists

Terraform Basics

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = var.aws_region
}

# Variables
variable "aws_region" {
  default = "us-east-1"
}

variable "environment" {
  type = string
}

Data Infrastructure Example

# S3 bucket for data lake
resource "aws_s3_bucket" "data_lake" {
  bucket = "company-data-lake-${var.environment}"

  tags = {
    Environment = var.environment
    Purpose     = "Data Lake"
  }
}

resource "aws_s3_bucket_versioning" "data_lake" {
  bucket = aws_s3_bucket.data_lake.id
  versioning_configuration {
    status = "Enabled"
  }
}

# RDS for data warehouse
resource "aws_db_instance" "analytics" {
  identifier     = "analytics-${var.environment}"
  engine         = "postgres"
  engine_version = "15"
  instance_class = "db.r6g.large"
  
  allocated_storage     = 100
  max_allocated_storage = 500
  
  db_name  = "analytics"
  username = var.db_username
  password = var.db_password
  
  vpc_security_group_ids = [aws_security_group.db.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
  
  backup_retention_period = 7
  skip_final_snapshot     = false
}

Modules for Reusability

# modules/data-pipeline/main.tf
variable "pipeline_name" {}
variable "s3_bucket" {}

resource "aws_glue_job" "etl" {
  name     = var.pipeline_name
  role_arn = aws_iam_role.glue.arn
  
  command {
    script_location = "s3://${var.s3_bucket}/scripts/${var.pipeline_name}.py"
    python_version  = "3"
  }
}

# Usage
module "sales_pipeline" {
  source        = "./modules/data-pipeline"
  pipeline_name = "sales-etl"
  s3_bucket     = aws_s3_bucket.scripts.id
}

Workflow

# Initialize
terraform init

# Preview changes
terraform plan

# Apply changes
terraform apply

# Destroy (careful!)
terraform destroy

Best Practices

Always use remote state storage
Lock state to prevent concurrent modifications
Use workspaces or directories for environments
Keep sensitive values in secrets manager
Review plan output before applying

IaC is a fundamental skill for modern data engineering. Start small and expand coverage iteratively.