The cost of over-provisioning, measured, The Cloud Ledger

When I inherited a platform team's AWS account, the running joke was "we size for Black Friday every day of the year." Every service ran on instances two sizes larger than it needed, every Auto Scaling group had a generous minimum, and every RDS instance was provisioned for a peak that arrived twice a year. Nobody could tell me what that headroom actually cost. So I measured it.

This post is about putting a real number on over-provisioning, because "we're probably wasting money" never gets prioritized, but "we're burning $14k a month on idle CPU" does.

The metric that matters: utilization vs. provisioned

Over-provisioning is the gap between what you pay for and what you use. For compute, the cleanest proxy is average and p95 CPU utilization against provisioned vCPUs. I pulled 30 days of CloudWatch data per instance and bucketed it:

p95 CPU	Interpretation	Action
< 10%	Severely over-provisioned	Downsize 2+ steps or consolidate
10-40%	Over-provisioned	Downsize one step
40-70%	Healthy	Leave it
> 70%	Tight	Watch; maybe scale up

The eye-opener: across 140 instances, the median p95 CPU was 11%. We were paying for roughly 8x the compute we used at peak.

Turning utilization into dollars

Utilization percentages don't move budgets; dollars do. I joined CloudWatch utilization to the on-demand price of each instance type to estimate recoverable spend from right-sizing each candidate one step down.

import boto3, datetime

cw = boto3.client("cloudwatch")

# halving the size roughly halves the hourly rate for most families
PRICE = {"m5.4xlarge": 0.768, "m5.2xlarge": 0.384,
         "m5.xlarge": 0.192, "m5.large": 0.096}

def p95_cpu(instance_id, days=30):
    end = datetime.datetime.utcnow()
    start = end - datetime.timedelta(days=days)
    r = cw.get_metric_statistics(
        Namespace="AWS/EC2", MetricName="CPUUtilization",
        Dimensions=[{"Name": "InstanceId", "Value": instance_id}],
        StartTime=start, EndTime=end, Period=3600,
        ExtendedStatistics=["p95"],
    )
    vals = [d["ExtendedStatistics"]["p95"] for d in r["Datapoints"]]
    return max(vals) if vals else 0.0

def monthly_saving(itype, next_itype):
    delta_hour = PRICE[itype] - PRICE[next_itype]
    return round(delta_hour * 730, 2)  # 730 hrs/month

Running that across the fleet produced a single ranked spreadsheet: instance, current type, p95 CPU, recommended type, and estimated monthly saving. The total at the bottom was the number that finally got engineering time allocated.

Over-provisioning isn't a bug, it's a bet on traffic you might not get. The cost of that bet is the premium you pay every hour the traffic doesn't show up.

The hidden multipliers

Raw instance cost understates the waste, because over-provisioning compounds:

EBS scales with intent. Bigger instances often got bigger gp3 volumes "to be safe," billed whether used or not.
Data transfer and NAT. Over-built multi-AZ topologies add cross-AZ transfer charges that have nothing to do with load.
Commitment lock-in. Reserved Instances or Savings Plans bought against an over-provisioned baseline lock the waste in for one to three years.
Carbon and quota. Idle capacity still consumes account quotas and emits, both of which matter to some orgs.

Right-sizing without an incident

The fear is always "if we downsize, we'll fall over at peak." I de-risked it:

Start with non-prod, where the data shows even worse utilization, and bank the easy savings.
Cross-check CloudWatch with Compute Optimizer recommendations, which factor memory and network, not just CPU.
Move to autoscaling so the floor is low and peak is handled by scaling out, not by a permanently large floor.
Change one step at a time and watch p95 for a week before the next step.

The result on this account: a 34% reduction in EC2 spend over two months, with no degradation in p99 latency, because the headroom we removed was never being used in the first place.

Takeaways

Measure the gap between provisioned and used capacity with 30 days of p95 utilization; a median p95 CPU in the low teens means severe over-provisioning.
Translate utilization into ranked monthly dollar savings; percentages don't get prioritized, dollars do.
Account for hidden multipliers: oversized EBS, cross-AZ transfer, and commitments bought against an inflated baseline.
De-risk right-sizing with non-prod first, Compute Optimizer cross-checks, autoscaling, and one step at a time.

The cost of over-provisioning, measured

The metric that matters: utilization vs. provisioned

Turning utilization into dollars

The hidden multipliers

Right-sizing without an incident

Takeaways

More on Cost Optimization