Kubernetes cost optimisation without sacrificing reliability
Right-sizing pods, Spot/Preemptible nodes, Karpenter autoscaling, and bin-packing strategies that actually work.
The two failure modes
Kubernetes clusters tend toward one of two failure modes: over-provisioned (expensive, reliable) or under-provisioned (cheap, unreliable). The goal is neither — it's right-sized, with reliability maintained through architecture rather than padding.
Right-sizing workloads
Start with kubectl top pods and the VPA (Vertical Pod Autoscaler) in recommendation mode. Don't apply VPA in auto mode to production — let it recommend, then apply manually after review.
A common pattern we see: request/limit ratios of 1:4 or worse. A pod requesting 256Mi but limiting to 1Gi creates scheduling inefficiency. Tighten limits to 1.5–2x requests for most stateless services.
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m" # 2x request
memory: "512Mi" # 2x request
Karpenter vs Cluster Autoscaler
Replace Cluster Autoscaler with Karpenter. Karpenter provisions nodes directly (no managed node groups required), understands pod requirements at scheduling time, and can consolidate workloads onto fewer nodes automatically.
A NodePool that uses Spot instances with on-demand fallback:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m6i.large", "m6i.xlarge"]
Karpenter will prefer Spot and fall back to on-demand when Spot is interrupted. Combined with PodDisruptionBudgets, this provides cost savings of 60–70% on node costs with minimal reliability impact.
Bin-packing and consolidation
Enable Karpenter's consolidation policy:
spec:
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
Karpenter will replace multiple under-utilised nodes with a single larger node. In our experience, this typically reduces node count by 20–35% with no application changes.
Namespace-level resource quotas
Enforce resource quotas per namespace so no single team can starve the cluster:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-payments
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
pods: "20"
What to measure
Track CPU and memory utilisation at the cluster level (target 60–70% average), cost per namespace using Kubecost or OpenCost, and Spot interruption rates. Aim for less than 2 interruptions per week per critical workload.
We've reduced Kubernetes costs by 40–60% on every cluster we've optimised, without degrading SLOs. Book a call to discuss your cluster.
Want help applying this to your infrastructure?
We work with startups and scale-ups on platform engineering, cloud infrastructure, and CI/CD. Book a call to discuss.
More from Strataform
Terraform module patterns that scale
How we structure Terraform for multi-environment and multi-team use: composable modules, minimal variables, and clear ownership.
EKS vs ECS: trade-offs for product teams
When to choose Kubernetes (EKS) over managed containers (ECS). Operational load, team size, and migration paths.
SLOs and alert fatigue: a practical guide
Defining SLOs that matter, burn-rate alerting, and avoiding noise so on-call stays actionable.