The cost of over-provisioning, measured
A back-of-envelope model for what idle headroom really costs you per year.
When I inherited a platform team's AWS account, the running joke was "we size for Black Friday every day of the year." Every service ran on instances two sizes larger than it needed, every Auto Scaling group had a generous minimum, and every RDS instance was provisioned for a peak that arrived twice a year. Nobody could tell me what that headroom actually cost. So I measured it.
This post is about putting a real number on over-provisioning, because "we're probably wasting money" never gets prioritized, but "we're burning $14k a month on idle CPU" does.
The metric that matters: utilization vs. provisioned
Over-provisioning is the gap between what you pay for and what you use. For compute, the cleanest proxy is average and p95 CPU utilization against provisioned vCPUs. I pulled 30 days of CloudWatch data per instance and bucketed it:
| p95 CPU | Interpretation | Action |
|---|---|---|
| < 10% | Severely over-provisioned | Downsize 2+ steps or consolidate |
| 10-40% | Over-provisioned | Downsize one step |
| 40-70% | Healthy | Leave it |
| > 70% | Tight | Watch; maybe scale up |
The eye-opener: across 140 instances, the median p95 CPU was 11%. We were paying for roughly 8x the compute we used at peak.
Turning utilization into dollars
Utilization percentages don't move budgets; dollars do. I joined CloudWatch utilization to the on-demand price of each instance type to estimate recoverable spend from right-sizing each candidate one step down.
import boto3, datetime
cw = boto3.client("cloudwatch")
# halving the size roughly halves the hourly rate for most families
PRICE = {"m5.4xlarge": 0.768, "m5.2xlarge": 0.384,
"m5.xlarge": 0.192, "m5.large": 0.096}
def p95_cpu(instance_id, days=30):
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(days=days)
r = cw.get_metric_statistics(
Namespace="AWS/EC2", MetricName="CPUUtilization",
Dimensions=[{"Name": "InstanceId", "Value": instance_id}],
StartTime=start, EndTime=end, Period=3600,
ExtendedStatistics=["p95"],
)
vals = [d["ExtendedStatistics"]["p95"] for d in r["Datapoints"]]
return max(vals) if vals else 0.0
def monthly_saving(itype, next_itype):
delta_hour = PRICE[itype] - PRICE[next_itype]
return round(delta_hour * 730, 2) # 730 hrs/month
Running that across the fleet produced a single ranked spreadsheet: instance, current type, p95 CPU, recommended type, and estimated monthly saving. The total at the bottom was the number that finally got engineering time allocated.
Over-provisioning isn't a bug, it's a bet on traffic you might not get. The cost of that bet is the premium you pay every hour the traffic doesn't show up.
The hidden multipliers
Raw instance cost understates the waste, because over-provisioning compounds:
- EBS scales with intent. Bigger instances often got bigger gp3 volumes "to be safe," billed whether used or not.
- Data transfer and NAT. Over-built multi-AZ topologies add cross-AZ transfer charges that have nothing to do with load.
- Commitment lock-in. Reserved Instances or Savings Plans bought against an over-provisioned baseline lock the waste in for one to three years.
- Carbon and quota. Idle capacity still consumes account quotas and emits, both of which matter to some orgs.
Right-sizing without an incident
The fear is always "if we downsize, we'll fall over at peak." I de-risked it:
- Start with non-prod, where the data shows even worse utilization, and bank the easy savings.
- Cross-check CloudWatch with Compute Optimizer recommendations, which factor memory and network, not just CPU.
- Move to autoscaling so the floor is low and peak is handled by scaling out, not by a permanently large floor.
- Change one step at a time and watch p95 for a week before the next step.
The result on this account: a 34% reduction in EC2 spend over two months, with no degradation in p99 latency, because the headroom we removed was never being used in the first place.
Takeaways
- Measure the gap between provisioned and used capacity with 30 days of p95 utilization; a median p95 CPU in the low teens means severe over-provisioning.
- Translate utilization into ranked monthly dollar savings; percentages don't get prioritized, dollars do.
- Account for hidden multipliers: oversized EBS, cross-AZ transfer, and commitments bought against an inflated baseline.
- De-risk right-sizing with non-prod first, Compute Optimizer cross-checks, autoscaling, and one step at a time.