Kubernetes cost optimisation without sacrificing reliability

The two failure modes

Kubernetes clusters tend toward one of two failure modes: over-provisioned (expensive, reliable) or under-provisioned (cheap, unreliable). The goal is neither — it's right-sized, with reliability maintained through architecture rather than padding.

Right-sizing workloads

Start with kubectl top pods and the VPA (Vertical Pod Autoscaler) in recommendation mode. Don't apply VPA in auto mode to production — let it recommend, then apply manually after review.

A common pattern we see: request/limit ratios of 1:4 or worse. A pod requesting 256Mi but limiting to 1Gi creates scheduling inefficiency. Tighten limits to 1.5–2x requests for most stateless services.

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"      # 2x request
    memory: "512Mi"  # 2x request

Karpenter vs Cluster Autoscaler

Replace Cluster Autoscaler with Karpenter. Karpenter provisions nodes directly (no managed node groups required), understands pod requirements at scheduling time, and can consolidate workloads onto fewer nodes automatically.

A NodePool that uses Spot instances with on-demand fallback:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "m6i.large", "m6i.xlarge"]

Karpenter will prefer Spot and fall back to on-demand when Spot is interrupted. Combined with PodDisruptionBudgets, this provides cost savings of 60–70% on node costs with minimal reliability impact.

Bin-packing and consolidation

Enable Karpenter's consolidation policy:

spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Karpenter will replace multiple under-utilised nodes with a single larger node. In our experience, this typically reduces node count by 20–35% with no application changes.

Namespace-level resource quotas

Enforce resource quotas per namespace so no single team can starve the cluster:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "20"

What to measure

Track CPU and memory utilisation at the cluster level (target 60–70% average), cost per namespace using Kubecost or OpenCost, and Spot interruption rates. Aim for less than 2 interruptions per week per critical workload.

We've reduced Kubernetes costs by 40–60% on every cluster we've optimised, without degrading SLOs. Book a call to discuss your cluster.

Kubernetes cost optimisation without sacrificing reliability

The two failure modes

Right-sizing workloads

Karpenter vs Cluster Autoscaler

Bin-packing and consolidation

Namespace-level resource quotas

What to measure

Want help applying this to your infrastructure?

More from Strataform

Terraform module patterns that scale

EKS vs ECS: trade-offs for product teams

SLOs and alert fatigue: a practical guide