DevOps from Zero to Hero: EKS, Running Kubernetes on AWS

2026-05-27 | Gabriel Garrido | 19 min read
Share:

Support this blog

If you find this content useful, consider supporting the blog.

Introduction

Welcome to article thirteen of the DevOps from Zero to Hero series. In the previous article we packaged our TypeScript API as a Helm chart. Now it is time to give that chart a real home on AWS by provisioning an EKS cluster.


Amazon Elastic Kubernetes Service (EKS) is AWS’s managed Kubernetes offering. You get a production-grade control plane that AWS patches, scales, and keeps highly available. You only worry about your workloads and the worker nodes that run them. If you have been following the series, you already know how ECS works from article eight. EKS takes a different approach: instead of a proprietary API, you get standard Kubernetes, which means everything you learned in articles eleven and twelve (Kubernetes fundamentals and Helm) applies directly.


If you want to see how Kubernetes on AWS was done before EKS became the default, check out From zero to hero with kops and AWS. That article covers kops, a tool that provisions self-managed clusters. EKS has since become the go-to choice for most teams because it removes the burden of managing the control plane yourself.


In this article we will cover what EKS is, compare it with ECS, provision a full cluster with Terraform, explore node group options, set up IAM Roles for Service Accounts, configure Karpenter for autoscaling, install the AWS Load Balancer Controller, deploy our TypeScript API, and discuss storage and cost considerations. Let’s get into it.


What is EKS?

EKS gives you a managed Kubernetes control plane. That means AWS runs the API server, etcd, the scheduler, and the controller manager for you. These components run across multiple availability zones for high availability, and AWS handles upgrades, patches, and backups.


Your responsibilities are:


  • Worker nodes: You provision the EC2 instances (or Fargate profiles) where your pods run. AWS offers managed node groups that automate the lifecycle of these instances, but you still decide instance types, sizes, and scaling.
  • Networking: EKS integrates with your VPC. Pods get IP addresses from your VPC subnets using the VPC CNI plugin, which means they are first-class citizens on the network.
  • Add-ons: Things like the CoreDNS, kube-proxy, and the VPC CNI are installed by default, but you manage their versions and configuration.
  • Workloads: Everything you deploy, from Deployments to StatefulSets to CronJobs, is your responsibility.

The EKS control plane costs $0.10 per hour (about $73 per month). On top of that you pay for whatever compute you use for worker nodes. This is important to keep in mind when we discuss cost later.


EKS vs ECS: when to use each

Both EKS and ECS run containers on AWS, but they solve the problem differently. Here is how to think about the choice:


  • EKS is standard Kubernetes. If your team already knows Kubernetes, if you need portability across clouds, or if you are running complex microservice architectures with custom operators, service meshes, or advanced scheduling, EKS is the right pick. The ecosystem is massive, and nearly every tool in the CNCF landscape works out of the box.
  • ECS is AWS-native. If your workloads are straightforward, if your team is small and does not want to learn Kubernetes, or if you want tight integration with AWS services without extra controllers, ECS is simpler and cheaper (no control plane fee). The Fargate launch type means you do not manage any infrastructure at all.

A practical rule of thumb: if you have fewer than five services and no requirement for multi-cloud, start with ECS. If you have a growing platform team, need the Kubernetes ecosystem, or plan to run on multiple providers, go with EKS.


For this series we are covering both because real teams encounter both. You already deployed to ECS in article eight. Now you will see how EKS compares hands-on.


Prerequisites

Before we start, make sure you have the following installed:


# AWS CLI v2
aws --version

# Terraform
terraform --version

# kubectl
kubectl version --client

# Helm
helm version

# eksctl (optional but useful for debugging)
eksctl version

You also need an AWS account with permissions to create VPCs, EKS clusters, IAM roles, and EC2 instances. If you followed article six (AWS from scratch), you already have this set up.


Provisioning the VPC with Terraform

EKS clusters live inside a VPC. The VPC needs public subnets (for load balancers) and private subnets (for worker nodes). Let’s start with the network foundation.


Create a new Terraform project:


mkdir -p eks-cluster/terraform
cd eks-cluster/terraform

First, the provider and backend configuration:


# providers.tf
terraform {
  required_version = ">= 1.5"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.12"
    }
    kubectl = {
      source  = "alx-v/kubectl"
      version = "~> 2.1"
    }
  }
}

provider "aws" {
  region = var.region
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

Now the variables:


# variables.tf
variable "region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "cluster_name" {
  description = "Name of the EKS cluster"
  type        = string
  default     = "devops-zero-to-hero"
}

variable "cluster_version" {
  description = "Kubernetes version"
  type        = string
  default     = "1.31"
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

And the VPC using the official AWS module:


# vpc.tf
data "aws_availability_zones" "available" {
  filter {
    name   = "opt-in-status"
    values = ["opt-in-not-required"]
  }
}

locals {
  azs = slice(data.aws_availability_zones.available.names, 0, 3)
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.cluster_name}-vpc"
  cidr = var.vpc_cidr

  azs             = local.azs
  private_subnets = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 4, k)]
  public_subnets  = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 8, k + 48)]
  intra_subnets   = [for k, v in local.azs : cidrsubnet(var.vpc_cidr, 8, k + 52)]

  enable_nat_gateway = true
  single_nat_gateway = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
    "karpenter.sh/discovery"         = var.cluster_name
  }

  tags = {
    Project     = "devops-zero-to-hero"
    Environment = "dev"
  }
}

A few things to note about the subnet tags:


  • kubernetes.io/role/elb on public subnets tells the AWS Load Balancer Controller where to place internet-facing ALBs.
  • kubernetes.io/role/internal-elb on private subnets is for internal load balancers.
  • karpenter.sh/discovery on private subnets lets Karpenter find subnets to launch nodes in.

We use a single NAT gateway to keep costs down for a dev environment. In production you would want one per availability zone for redundancy.


Provisioning the EKS cluster

Now for the main event. We will use the official EKS Terraform module, which wraps a lot of complexity into a clean interface:


# eks.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = var.cluster_name
  cluster_version = var.cluster_version

  # Cluster access
  cluster_endpoint_public_access = true

  # Cluster add-ons
  cluster_addons = {
    coredns                = {}
    eks-pod-identity-agent = {}
    kube-proxy             = {}
    vpc-cni                = {}
  }

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Give the Terraform identity admin access to the cluster
  enable_cluster_creator_admin_permissions = true

  # Managed node groups
  eks_managed_node_groups = {
    default = {
      instance_types = ["t3.medium"]

      min_size     = 2
      max_size     = 5
      desired_size = 2

      labels = {
        role = "general"
      }

      tags = {
        "karpenter.sh/discovery" = var.cluster_name
      }
    }
  }

  tags = {
    Project     = "devops-zero-to-hero"
    Environment = "dev"
  }
}

This creates an EKS cluster with a managed node group of two t3.medium instances. Let’s break down what is happening:


  • cluster_endpoint_public_access: Makes the Kubernetes API reachable from the internet. For production you might restrict this to specific CIDR blocks or use a VPN.
  • cluster_addons: These are the essential EKS add-ons. CoreDNS handles service discovery, kube-proxy manages network rules, and vpc-cni gives pods VPC-native IP addresses.
  • enable_cluster_creator_admin_permissions: Grants the IAM identity that creates the cluster full admin access. Without this, you can lock yourself out.
  • eks_managed_node_groups: We define one node group with auto-scaling between 2 and 5 nodes.

Node groups: understanding your options

EKS gives you three ways to run your workloads. Each has trade-offs:


  • Managed node groups: AWS handles the EC2 instance lifecycle. You pick instance types and sizes, and AWS takes care of provisioning, draining, and updating nodes. This is the default choice for most teams. The example above uses managed node groups.
  • Self-managed node groups: You create and manage the EC2 instances yourself using Auto Scaling Groups. This gives you full control but more operational overhead. Use this only if you need custom AMIs, GPUs with specific drivers, or unusual instance configurations.
  • Fargate profiles: AWS runs your pods on serverless compute. No EC2 instances to manage at all. Each pod gets its own isolated micro-VM. This is great for batch jobs or workloads with unpredictable scaling, but it has limitations: no DaemonSets, no persistent volumes backed by EBS, and higher per-pod cost compared to well-utilized EC2 instances.

For most workloads, start with managed node groups. If you need more sophisticated scaling (which we will set up shortly), add Karpenter on top.


IAM Roles for Service Accounts (IRSA)

This is one of the most important EKS concepts to understand. Your pods often need to talk to AWS services: reading from S3, writing to DynamoDB, sending messages to SQS. The old approach was to attach IAM policies to the node’s instance profile, but that means every pod on that node gets the same permissions. That is a security nightmare.


IRSA solves this by letting you map a Kubernetes ServiceAccount to a specific IAM role. Only pods using that ServiceAccount get those permissions. Here is how it works under the hood:


Pod (with ServiceAccount annotation)
  --> Kubernetes mounts a projected token
    --> AWS STS validates the token via OIDC
      --> Pod assumes the IAM role
        --> Pod gets temporary AWS credentials

EKS creates an OpenID Connect (OIDC) provider for your cluster. When a pod starts, Kubernetes injects a signed JWT token. AWS STS validates this token against the OIDC provider and issues temporary credentials for the mapped IAM role. No long-lived credentials, no shared permissions.


Here is how to set up IRSA for a pod that needs S3 access:


# irsa.tf

# The OIDC provider is created by the EKS module automatically
# We just need to create the IAM role and policy

module "s3_reader_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name = "${var.cluster_name}-s3-reader"

  role_policy_arns = {
    policy = aws_iam_policy.s3_read.arn
  }

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["default:s3-reader"]
    }
  }
}

resource "aws_iam_policy" "s3_read" {
  name        = "${var.cluster_name}-s3-read"
  description = "Allow reading from the application S3 bucket"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:ListBucket"
        ]
        Resource = [
          "arn:aws:s3:::my-app-bucket",
          "arn:aws:s3:::my-app-bucket/*"
        ]
      }
    ]
  })
}

Then in your Kubernetes manifest (or Helm values), you annotate the ServiceAccount:


apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-reader
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/devops-zero-to-hero-s3-reader

Any pod using this ServiceAccount will automatically receive temporary AWS credentials scoped to that IAM role. This is the right way to handle AWS permissions in EKS.


Cluster autoscaler vs Karpenter

When your workloads grow, you need more nodes. There are two main options for autoscaling nodes in EKS:


  • Cluster Autoscaler: The traditional Kubernetes approach. It watches for pods that cannot be scheduled due to insufficient resources, then adds nodes from your existing node groups. It works, but it is limited by your pre-defined node group configurations. If you need a GPU instance but your node group only has t3.medium, you are stuck.
  • Karpenter: AWS’s open-source node provisioner. Instead of scaling pre-defined node groups, Karpenter looks at pending pod requirements and provisions the right instance type on the fly. It can mix instance types, use Spot instances, and right-size nodes based on actual workload needs. It is faster, smarter, and more cost-effective.

For new clusters, Karpenter is the better choice. Let’s set it up.


Setting up Karpenter with Terraform

Karpenter needs IAM permissions to launch EC2 instances and manage their lifecycle. The official Karpenter module for Terraform makes this straightforward:


# karpenter.tf
module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.0"

  cluster_name = module.eks.cluster_name

  # Create the IAM role for the Karpenter controller
  enable_v1_permissions = true

  # Create the node IAM role that Karpenter-provisioned nodes will use
  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

  tags = {
    Project     = "devops-zero-to-hero"
    Environment = "dev"
  }
}

# Install Karpenter using Helm
resource "helm_release" "karpenter" {
  namespace        = "kube-system"
  name             = "karpenter"
  repository       = "oci://public.ecr.aws/karpenter"
  chart            = "karpenter"
  version          = "1.1.1"
  wait             = false

  values = [
    <<-EOT
    serviceAccount:
      name: ${module.karpenter.service_account}
    settings:
      clusterName: ${module.eks.cluster_name}
      clusterEndpoint: ${module.eks.cluster_endpoint}
      interruptionQueue: ${module.karpenter.queue_name}
    EOT
  ]
}

After Karpenter is installed, you need to define a NodePool and an EC2NodeClass that tell Karpenter what kind of nodes to provision:


# karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r", "t"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
  limits:
    cpu: "100"
    memory: 200Gi
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole-devops-zero-to-hero"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: devops-zero-to-hero
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: devops-zero-to-hero
  tags:
    Project: devops-zero-to-hero
    ManagedBy: karpenter

Apply the Karpenter resources after the cluster is ready:


kubectl apply -f karpenter-nodepool.yaml

Here is what is happening in this configuration:


  • NodePool: Defines constraints for nodes. We allow both on-demand and spot instances, restrict to modern instance families (c, m, r, t with generation > 4), and set resource limits so Karpenter does not spin up unlimited compute.
  • expireAfter: Nodes are recycled after 30 days. This ensures they pick up the latest AMIs and security patches.
  • consolidationPolicy: Karpenter actively consolidates workloads. If nodes are empty or underutilized, it moves pods around and terminates the excess nodes to save cost.
  • EC2NodeClass: Defines AWS-specific settings like the AMI, IAM role, and subnet/security group selectors.

With Karpenter running, you can scale down your managed node group to just one or two nodes for system workloads, and let Karpenter handle everything else dynamically.


AWS Load Balancer Controller

By default, Kubernetes services of type LoadBalancer create Classic Load Balancers on AWS. These are outdated. The AWS Load Balancer Controller replaces that behavior with modern ALBs (for HTTP/HTTPS) and NLBs (for TCP/UDP).


The controller watches for Ingress resources and Service annotations, then creates and configures the corresponding AWS load balancers automatically. Let’s install it:


# alb-controller.tf
module "lb_controller_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name                              = "${var.cluster_name}-lb-controller"
  attach_load_balancer_controller_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-load-balancer-controller"]
    }
  }
}

resource "helm_release" "aws_lb_controller" {
  namespace  = "kube-system"
  name       = "aws-load-balancer-controller"
  repository = "https://aws.github.io/eks-charts"
  chart      = "aws-load-balancer-controller"
  version    = "1.9.2"

  set {
    name  = "clusterName"
    value = module.eks.cluster_name
  }

  set {
    name  = "serviceAccount.name"
    value = "aws-load-balancer-controller"
  }

  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.lb_controller_irsa.iam_role_arn
  }

  set {
    name  = "vpcId"
    value = module.vpc.vpc_id
  }
}

Notice how we use IRSA here. The Load Balancer Controller needs permissions to create ALBs, manage target groups, and read subnet tags. Instead of giving those permissions to the node, we create a dedicated IAM role and bind it to the controller’s ServiceAccount.


Once installed, you can create Ingress resources that automatically provision ALBs:


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: task-api
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc-123
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: task-api
                port:
                  number: 3000

The controller reads the annotations, creates an ALB in your public subnets, attaches the ACM certificate for TLS, and routes traffic to your pods. You do not need to manage load balancers manually anymore.


Configuring kubeconfig

After the cluster is provisioned, you need to configure kubectl to talk to it. The AWS CLI makes this simple:


# Update your kubeconfig
aws eks update-kubeconfig --region us-east-1 --name devops-zero-to-hero

# Verify the connection
kubectl get nodes

You should see your managed node group instances:


NAME                             STATUS   ROLES    AGE   VERSION
ip-10-0-1-42.ec2.internal       Ready    <none>   5m    v1.31.2-eks-7f9249a
ip-10-0-2-87.ec2.internal       Ready    <none>   5m    v1.31.2-eks-7f9249a

If you work with multiple clusters, you can switch between them using contexts:


# List all contexts
kubectl config get-contexts

# Switch to a specific context
kubectl config use-context arn:aws:eks:us-east-1:123456789012:cluster/devops-zero-to-hero

# Rename a context for convenience
kubectl config rename-context \
  arn:aws:eks:us-east-1:123456789012:cluster/devops-zero-to-hero \
  eks-dev

Deploying the TypeScript API to EKS

Remember the Helm chart we built in article twelve? Now we put it to use. If you have your chart in an OCI registry, the deployment is a single command:


# Create a namespace for the application
kubectl create namespace task-api

# Install the chart
helm install task-api oci://ghcr.io/your-org/charts/task-api \
  --version 0.1.0 \
  --namespace task-api \
  -f values-eks.yaml

Here is what the EKS-specific values file looks like:


# values-eks.yaml
replicaCount: 2

image:
  repository: 123456789012.dkr.ecr.us-east-1.amazonaws.com/task-api
  tag: "1.0.0"

service:
  type: ClusterIP
  port: 3000

ingress:
  enabled: true
  className: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc-123
    alb.ingress.kubernetes.io/healthcheck-path: /health
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/task-api-role

After the deployment completes, you can check everything is running:


# Check the pods
kubectl get pods -n task-api
NAME                        READY   STATUS    RESTARTS   AGE
task-api-6d8f9c7b4a-k2m5n   1/1     Running   0          2m
task-api-6d8f9c7b4a-x9p3r   1/1     Running   0          2m

# Check the ingress (the ALB takes a minute or two to provision)
kubectl get ingress -n task-api
NAME       CLASS   HOSTS              ADDRESS                                      PORTS   AGE
task-api   alb     api.example.com    k8s-taskapi-xxxxx.us-east-1.elb.amazonaws.com   80      3m

# Test the endpoint
curl https://api.example.com/health
{"status": "ok"}

The AWS Load Balancer Controller sees the Ingress resource, creates an ALB, configures target groups pointing to your pod IPs, and attaches the TLS certificate. Traffic flows from the internet through the ALB directly to your pods.


Storage: EBS CSI driver

If your workloads need persistent storage (databases, caches, file uploads), you need the EBS CSI driver. This driver allows Kubernetes PersistentVolumes to be backed by EBS volumes.


Add it as an EKS add-on in your Terraform:


# Add to the cluster_addons in eks.tf
cluster_addons = {
  coredns                = {}
  eks-pod-identity-agent = {}
  kube-proxy             = {}
  vpc-cni                = {}
  aws-ebs-csi-driver = {
    service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
  }
}

# ebs-csi.tf
module "ebs_csi_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name             = "${var.cluster_name}-ebs-csi"
  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

Then create a StorageClass and use it in your workloads:


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  fsType: ext4
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3
  resources:
    requests:
      storage: 10Gi

The WaitForFirstConsumer binding mode is important. It delays volume creation until a pod actually needs it, ensuring the volume is created in the same availability zone as the pod. Without this, you can end up with a volume in one AZ and a pod that needs to run in another.


Cost considerations

EKS is not cheap, especially compared to ECS with Fargate for small workloads. Here is what you are paying for:


  • Control plane: $0.10/hour ($73/month). This is fixed regardless of how many nodes you run.
  • Worker nodes: Standard EC2 pricing. A t3.medium (2 vCPU, 4 GB) runs about $30/month on-demand.
  • Spot instances: Up to 90% cheaper than on-demand, but can be interrupted. Karpenter makes using Spot easy by diversifying across instance types. Great for stateless workloads, not recommended for databases.
  • NAT gateway: $32/month plus data transfer. This is often the sneaky cost that surprises people. Use a single NAT gateway for dev, one per AZ for production.
  • Load balancers: ALBs cost about $16/month plus data transfer. Each Ingress resource can share a single ALB using IngressGroups to avoid provisioning one per service.
  • Data transfer: Inter-AZ traffic costs $0.01/GB each way. Cross-AZ pod-to-pod communication adds up in chatty microservice architectures.

Cost saving tips:


  • Use Karpenter with Spot instances for stateless workloads. Diversify across many instance types to reduce interruption rates.
  • Right-size your nodes. Karpenter helps here by picking the optimal instance type for your workload mix.
  • Consolidate ALBs using IngressGroup annotations so multiple services share one ALB.
  • Use a single NAT gateway for non-production environments.
  • Set resource requests and limits on every pod so Karpenter can bin-pack efficiently.
  • Consider Savings Plans or Reserved Instances for baseline capacity you know you will always need.

A minimal EKS dev environment (control plane + 2 t3.medium nodes + NAT gateway + ALB) costs roughly $180/month. A production setup with more nodes, multi-AZ NAT, and monitoring will be significantly more. Compare this to ECS with Fargate where you only pay for the compute your containers actually use.


Putting it all together

Let’s run through the full provisioning flow:


# Initialize Terraform
cd eks-cluster/terraform
terraform init

# Review the plan
terraform plan -out=tfplan

# Apply (this takes 15-20 minutes, mostly the EKS cluster creation)
terraform apply tfplan

# Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name devops-zero-to-hero

# Verify the cluster
kubectl get nodes
kubectl get pods -n kube-system

# Apply Karpenter resources
kubectl apply -f karpenter-nodepool.yaml

# Deploy the application
kubectl create namespace task-api
helm install task-api oci://ghcr.io/your-org/charts/task-api \
  --version 0.1.0 \
  --namespace task-api \
  -f values-eks.yaml

# Check everything is running
kubectl get all -n task-api

After about 20 minutes, you will have a fully functional EKS cluster with managed node groups, Karpenter for dynamic scaling, the AWS Load Balancer Controller for automated ALB provisioning, IRSA for secure pod-level AWS permissions, and the EBS CSI driver for persistent storage.


Cleaning up

If you are following along and do not want to keep the cluster running, tear it down:


# Remove application resources first
helm uninstall task-api -n task-api
kubectl delete -f karpenter-nodepool.yaml

# Destroy everything with Terraform
terraform destroy

Always remove Kubernetes resources before destroying the infrastructure. If you destroy the VPC while ALBs still exist, Terraform will hang waiting for the load balancers to be deleted, and you will have to clean them up manually in the AWS console.


Closing notes

EKS gives you the full power of Kubernetes without the operational burden of managing the control plane. In this article we provisioned a complete cluster with Terraform, configured managed node groups for baseline compute, set up Karpenter for intelligent autoscaling, used IRSA for secure pod-level AWS permissions, installed the AWS Load Balancer Controller for automated ALB management, and deployed our TypeScript API from the Helm chart we built in the previous article.


The trade-off compared to ECS is complexity and cost. EKS requires more infrastructure knowledge, more moving parts, and a baseline cost even when nothing is running. But in return you get the entire Kubernetes ecosystem, portability across clouds, and the ability to handle complex workloads that would be difficult to model in ECS.


In the next article we will dive into monitoring and observability, because having a running cluster is only the beginning. You need to know what is happening inside it.


Hope you found this useful and enjoyed reading it, until next time!


Errata

If you spot any error or have any suggestion, please send me a message so it gets fixed.

Also, you can check the source code and changes in the sources here



$ Comments

Online: 0

Please sign in to be able to write comments.

2026-05-27 | Gabriel Garrido