VidulividuliBeta

IaC Made Infrastructure Repeatable. It Didn't Make It Simple.

Infrastructure as Code solved configuration drift and gave us reproducible environments. But the tradeoff was ClickOps for 10,000 lines of Terraform. That's a very mediocre form of progress.

By Avin Kavish
AnalysisInfrastructureDevOps
Background
IaC Made Infrastructure Repeatable. It Didn't Make It Simple.

Infrastructure as Code was supposed to be the answer, no more clicking through AWS consoles, no more "it works on my machine.", no more infrastructure that lives in the memory of the technician.

Version controled, reproducible infrastructure that was treated like software worked to some extent. It 'fixed' a few problems, but really it substituted one type of complexity for another.

ClickOps on a platform like AWS is chaotic. You start creating something in one corner and by the time you get to the other side of the console, you start to forget what you were doing.

On the other hand, IaC helps to see the state of the infra at a glance.

What IaC Actually Delivered

I think Infrastructure as Code is a substantial improvement over what came before.

It solved the following:

  • Infrastructure as state - Terraform files as the source of truth
  • Reproducible environments - Dev, staging, and prod built identically
  • Version control - Infra changes tracked in git
  • No more ClickOps disasters - Mitigation of configuration mishaps
  • Disaster recovery - Rebuild everything from code

So I understand why IaC adoption exploded. It solved real, painful problems.

What It Didn't Deliver

It didn't necessarily reduce the amount of work required to maintain clood infrastructure.

When I first heard about Terraform, I was excited thinking it was going solve all my infrastructure pain points.

But what actually happened was I had to become a student again. I had to learn

  • HCL syntax and its quirks
  • State management (local vs remote, locking, encryption)
  • Terraform modules (writing, versioning, consuming, debugging)
  • Provider-specific syntax (the AWS provider has 5000+ resources)
  • Resource dependencies and ordering issues
  • Data sources vs resources (and when to use which)
  • Terraform Cloud vs Terraform Enterprise vs self-hosted
  • How to refactor without accidentally destroying production

That wasn't even an exhaustive list. This was learning an entirely new tool with its own concepts, patterns, and footguns.

Debugging Experience

I had a frustrating experience when terraform apply failed with:

Error: error creating ECS Service: InvalidParameterException:
Unable to assume the service linked role. Please verify that
the ECS service linked role exists.

To overcome this, I had to spend the next hour,

  1. Googling the error (before ChatGPT)
  2. Finding a 3-year-old Stack Overflow post
  3. Realizing it was about IAM permissions
  4. Checking if the service-linked role existed
  5. Creating it manually (through the console)
  6. Trying again
  7. Hitting a different error about subnet groups

I was debugging Terraform's interpretation of my intent into AWS API calls.

Maintenance Overheard

Six months into using Terraform, I discovered the maintenance burden:

  • AWS provider updated to x.x with breaking changes
  • Half my modules used deprecated resources
  • State was locked and I couldn't figure out why
  • Needed to upgrade Terraform version for a new feature
  • Had to refactor 50 resources to use a new naming scheme

I wasn't just maintaining infrastructure. I was maintaining infrastructure code that managed infrastructure. Meta-work.

The Mental Model Problem

To deploy a database with Terraform, I needed to understand:

  1. How databases work (reasonable)
  2. How AWS RDS works (cloud-specific)
  3. How Terraform models RDS (IaC-specific)
  4. How to structure Terraform code properly (tool-specific)

Four layers of knowledge for one database.

And when things broke, I needed to understand:

  1. How Terraform state works internally
  2. How AWS API rate limits affect Terraform
  3. How to read Terraform debug logs
  4. How to import existing resources into state

This cognitive overhead slowed me down. I started despising infrastructure changes because the friction was too high.

The Code

Here's what actually building with IaC looks like:

Example 1: Deploying a Postgres Database

When you need to deploy an RDS instance, it typically looks like this,

# RDS Subnet Group
resource "aws_db_subnet_group" "main" {
  name       = "my-db-subnet-group"
  subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]

  tags = {
    Name = "My DB subnet group"
  }
}

# Security Group
resource "aws_security_group" "db" {
  name        = "db-security-group"
  description = "Security group for RDS"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/16"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# RDS Parameter Group
resource "aws_db_parameter_group" "main" {
  name   = "my-postgres-params"
  family = "postgres15"

  parameter {
    name  = "shared_preload_libraries"
    value = "pg_stat_statements"
  }

  parameter {
    name  = "log_min_duration_statement"
    value = "1000"
  }
}

# RDS Instance
resource "aws_db_instance" "main" {
  identifier           = "my-postgres-db"
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = "db.t3.medium"
  allocated_storage    = 100
  storage_type         = "gp3"
  storage_encrypted    = true

  db_name  = "myapp"
  username = "admin"
  password = var.db_password # Stored in Terraform variables

  db_subnet_group_name   = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.db.id]
  parameter_group_name   = aws_db_parameter_group.main.name

  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "mon:04:00-mon:05:00"

  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]

  skip_final_snapshot = false
  final_snapshot_identifier = "my-postgres-final-snapshot"

  tags = {
    Name        = "My Postgres DB"
    Environment = "production"
  }
}

# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "db_cpu" {
  alarm_name          = "db-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors RDS CPU"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    DBInstanceIdentifier = aws_db_instance.main.id
  }
}

That's about 100 lines, when simplified. That's not including,

  • Read replicas
  • Backup verification
  • Secret rotation setup
  • Enhanced monitoring
  • Performance Insights configuration
  • Additional CloudWatch alarms
  • Output variables for connection strings

things which are needed for production.

When I built Viduli, I wanted to eliminate all of this. So on Viduli, the same database deployment is:

  1. Click "Create Service"
  2. Select "Orbit" (database service)
  3. Choose "Postgres"
  4. Name it (which is also generated automatically)
  5. Click "Create"

20 seconds. No code. The database, security, backups, monitoring, high availability - all configured the way it should be by default.

Example 2: Deploying a Web Application with Auto-Scaling

For a containerized web application that needed to scale automatically, my Terraform looked like this (again, simplified):

# ECR Repository
resource "aws_ecr_repository" "app" {
  name = "my-app"
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "my-cluster"
}

# Task Execution Role
resource "aws_iam_role" "ecs_task_execution" {
  name = "ecs-task-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Task Definition
resource "aws_ecs_task_definition" "app" {
  family                   = "my-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn

  container_definitions = jsonencode([
    {
      name  = "my-app"
      image = "${aws_ecr_repository.app.repository_url}:latest"
      portMappings = [
        {
          containerPort = 8080
          protocol      = "tcp"
        }
      ]
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.app.name
          "awslogs-region"        = "us-east-1"
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])
}

# Application Load Balancer
resource "aws_lb" "main" {
  name               = "my-app-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = [aws_subnet.public_a.id, aws_subnet.public_b.id]
}

# Target Group
resource "aws_lb_target_group" "app" {
  name        = "my-app-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 3
  }
}

# Listener
resource "aws_lb_listener" "app" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

# ECS Service
resource "aws_ecs_service" "app" {
  name            = "my-app-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = [aws_subnet.private_a.id, aws_subnet.private_b.id]
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "my-app"
    container_port   = 8080
  }
}

# Auto Scaling Target
resource "aws_appautoscaling_target" "ecs" {
  max_capacity       = 10
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

# Auto Scaling Policy
resource "aws_appautoscaling_policy" "ecs_cpu" {
  name               = "cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70.0
  }
}

# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "app" {
  name              = "/ecs/my-app"
  retention_in_days = 7
}

That's about 200 lines. And it's still missing:

  • VPC and subnet definitions
  • Security group rules
  • HTTPS/SSL certificates
  • DNS configuration
  • CI/CD pipeline setup
  • Container registry authentication

On Viduli, I wanted developers (well, myself mostly) to skip all of this:

  1. Click "Create Service"
  2. Select "Ignite" (application service)
  3. Connect their GitHub repository
  4. Click "Create"

20 seconds. The auto-scaling, load balancing, health checks, monitoring, and CI/CD pipeline all come configured by default.

Example 3: Adding a Redis Cache

The Redis cache setup in Terraform:

# ElastiCache Subnet Group
resource "aws_elasticache_subnet_group" "main" {
  name       = "redis-subnet-group"
  subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
}

# Security Group for Redis
resource "aws_security_group" "redis" {
  name        = "redis-security-group"
  description = "Security group for Redis"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 6379
    to_port     = 6379
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/16"]
  }
}

# ElastiCache Parameter Group
resource "aws_elasticache_parameter_group" "main" {
  name   = "redis-params"
  family = "redis7"

  parameter {
    name  = "maxmemory-policy"
    value = "allkeys-lru"
  }
}

# ElastiCache Replication Group
resource "aws_elasticache_replication_group" "main" {
  replication_group_id       = "my-redis"
  replication_group_description = "Redis cluster"
  engine                     = "redis"
  engine_version            = "7.0"
  node_type                 = "cache.t3.micro"
  number_cache_clusters     = 2
  parameter_group_name      = aws_elasticache_parameter_group.main.name
  port                      = 6379
  subnet_group_name         = aws_elasticache_subnet_group.main.name
  security_group_ids        = [aws_security_group.redis.id]

  automatic_failover_enabled = true

  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                = var.redis_password

  snapshot_retention_limit = 5
  snapshot_window         = "03:00-05:00"

  tags = {
    Name = "My Redis Cluster"
  }
}

# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "redis_cpu" {
  alarm_name          = "redis-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ElastiCache"
  period              = "300"
  statistic           = "Average"
  threshold           = "75"

  dimensions = {
    CacheClusterId = aws_elasticache_replication_group.main.id
  }
}

About 80 lines.

The Viduli Difference

On Viduli:

  1. Click "Create Service"
  2. Select "Flash" (cache service)
  3. Choose "Redis"
  4. Click "Create"

20 seconds.

IaC Abstration

With IaC, I was describing HOW to build infrastructure:

  • "Create a subnet group in these availability zones"
  • "Create a security group with these ingress rules"
  • "Create an RDS instance with these parameters"
  • "Connect them using these references"

I was thinking at the AWS API level. I needed to know about subnet groups, security groups, parameter groups, IAM roles, and how they all connected.

When I built Viduli, I wanted to just declare WHAT I needed:

  • "I need a Postgres database"

And let the platform handle the rest. VPCs, network groups, security policies and all that stuff is tangential to the matter.

Do I need to understand TCP congestion control to build a web app? No, because HTTP abstracts it away. Same principle here.

What Actually Matters

When I need to deploy a database, do I really need to know:

  • Which availability zones to use? (AWS implementation detail)
  • How to configure subnet groups? (AWS implementation detail)
  • Which security group rules to allow? (AWS implementation detail)
  • What parameter group settings are optimal? (Database + AWS-specific tuning)

Or do I just need a working database?

Most of the time, I just needed a working database.

The complexity of subnet groups isn't inherent to running a database. It's specific to how AWS chose to model infrastructure. That's AWS's problem, not mine.

Viduli eliminates that layer entirely.

When IaC Makes Sense

I'm not anti-IaC across the board. There are situations where Terraform or similar tools are genuinely the right choice:

Multi-Cloud Strategies

If you're running across AWS, GCP, and Azure, IaC can provide a unified interface. Write Terraform once (theoretically) and deploy everywhere.

This makes sense at certain scales.

Highly Customized Infrastructure

If you need control over every detail - specific kernel parameters, custom networking setups, non-standard configurations - IaC gives you that control.

Some applications genuinely need this level of customization. Most don't.

Compliance Requirements

Some industries require specific configurations and explicit audit trails at the infrastructure level. IaC provides that explicit control and documentation.

If compliance mandates it, IaC makes sense.

Platform Teams at Scale

If you're building internal platforms for 100+ engineers, you might need the flexibility and control that IaC provides.

At that scale, investing in IaC expertise is justified.

When I Wanted Something Different

Most of my projects fell into a different category:

I Wanted to Ship Product

My business value was in the application, not the infrastructure. I wanted to focus on features, not subnet configurations.

I was building a SaaS app, not competing with AWS on infrastructure.

Small Teams

With fewer than 50 engineers, hiring DevOps specialists just to maintain Terraform felt unnecessary. (But really, I think Viduli will scale up to teams of any size in the future.)

I wanted production-ready infrastructure without needing the specialists.

Standard Architecture

I was building web apps, APIs, databases, caches, and background workers - the standard building blocks. I didn't need custom configurations.

I just needed these things to work.

Fast Iteration

I wanted to deploy 10 times a day, not spend an hour on each deployment tweaking VPC subnet ranges.

Control is useful when you need it. Most of the time, I needed speed.

What It Actually Cost Me

Let me be honest about the time investment I experienced:

With Infrastructure as Code

  • Learning: Weeks understanding Terraform, AWS, and best practices
  • First deployment: Hours setting everything up, even with examples
  • Each change: 15-30 minutes (write code, plan, review, apply)
  • Debugging: Hours when things broke
  • Maintenance: Ongoing - provider updates, refactoring, drift management
  • Team onboarding: Days per engineer

For a simple 3-tier app, my rough time investment was:

  • Initial setup: 2-3 days
  • Per deployment: 30 minutes
  • Monthly maintenance: 4-8 hours
  • Onboarding each new engineer: 1-2 days

What I Wanted (and Built)

  • Learning: Minutes exploring the UI
  • First deployment: 1 minute per service
  • Each change: Automatic on git push
  • Debugging: Application logs, not infrastructure
  • Maintenance: Zero (platform handles it)
  • Team onboarding: 10 minutes

For the same 3-tier app on Viduli:

  • Initial setup: 5 minutes
  • Per deployment: Automatic
  • Monthly maintenance: 0 hours
  • Onboarding: 10 minutes per engineer

The time difference compounds. Over a year, that's weeks of engineering time reclaimed for actual product work.

Does Viduli Plan to Support IaC?

IaC is not out of scope. IaC will likely be added as a convenience feature for developers who want infrastructure definitions living next to their code.

I understand the appeal of GitOps workflows where everything is in version control. It's a valid preference.

The moral of the story is: The radical simplicity of the platform leads to radically simple IaC with a much shallower learning curve.

What Viduli IaC Might Look Like

Imagine this in your repository:

# infrastructure.py
from viduli import Project, Ignite, OrbitPostgres, FlashRedis

project = Project(
    name="hello-world",
    domains=["my-super-cool-saas.com"],
)

api = Ignite(
    name="api",
    github="acme/api",
    project=project,
)

worker = Ignite(
    name="worker",
    github="acme/worker",
    project=project,
)

database = OrbitPostgres(name="database", project=project)

cache = FlashRedis(name="cache", project=project)

That's it. That's your entire infrastructure definition. Real Python (Or Go or Java) code, type-safe, with IDE autocomplete.

Compared to Terraform:

  • Postgres: 100+ lines
  • Redis: 80+ lines
  • Two applications: 200+ lines each
  • Supporting infrastructure: 300+ lines
  • Total: 800-1000+ lines of HCL

Viduli IaC: ~15 lines of actual code.

Why So Simple?

Because the platform handles all the complexity:

  • No VPCs, subnets, or security groups to define
  • No IAM roles and policies to configure
  • No load balancers and auto-scaling to set up
  • No monitoring and logging to configure
  • No state files to manage

I just declare what I need. The platform provides it.

That's what I think IaC should be - infrastructure as actual code, not infrastructure as 1000 lines of configuration DSL.

Both Worlds

With Viduli, I'm not forcing a choice between simple UI or IaC.

I wanted:

  • Simple UI for quick iteration and learning
  • Simple IaC for GitOps and version control
  • Same platform underneath both

Use the UI when prototyping. Switch to IaC when ready. Or use both - create via UI, export to IaC for version control later.

What I Learned Building This

Here's what became clear to me while building Viduli:

IaC isn't the problem. The infrastructure underneath is.

Terraform is complicated because AWS is complicated. CloudFormation is complicated because AWS is complicated.

They're both trying to expose 10,000+ AWS resources in a declarative format. That's inherently complex.

Better IaC syntax doesn't fix this. Simpler infrastructure does.

When the underlying platform has 10 concepts instead of 10,000, the IaC becomes proportionally simpler.

That's why Viduli can offer both:

  • Zero-code deployment via UI
  • Simple IaC when you want it

Because the platform underneath is fundamentally simpler.

Conclusion

IaC made infrastructure repeatable. That was important. That was necessary.

But repeatable isn't the endgame. Simple is.

I shouldn't need 1000 lines of IaC to deploy a Web app. I shouldn't need to understand VPC networking to run a web app. I shouldn't spend weeks learning Terraform to ship product.

Modern platforms can be both repeatable and simple. It doesn't have to be a mutually exclusive choice.

IaC solved yesterday's problem. I think platforms solve today's.

And when infrastructure definitions need to live in code, that can be supported too - but simpler than we thought possible.

Because the goal was never "infrastructure as code."

The goal was always "infrastructure that just works."


Stay Updated

Get the latest updates and insights delivered to your inbox.