Your 2026 AWS cost optimization playbook
An updated, prioritized list of cost levers, highest impact first.
Every year I do a top-to-bottom cost pass on the accounts I run, and every year the same surprise repeats: the savings are not where people assume. Nobody's bill is dominated by the thing they obsess over in code review. It is dominated by idle capacity, mispriced commitments, and storage nobody ever deletes.
This is my 2026 playbook, ordered the way I actually work it, visibility first, then the big structural levers, then the long tail. The goal is durable savings, not a one-time scrub that creeps back in three months.
Step 1: See the bill before you touch it
You cannot optimize what you cannot attribute. Before any change:
- Enable the Cost and Usage Report into S3 and query it with Athena. Cost Explorer is fine for trends; the CUR is where the truth lives.
- Enforce a tagging policy (
team,service,env) so spend maps to owners. Untagged spend is unowned spend. - Set anomaly detection and budget alerts so a runaway resource pages someone the same day, not at month-end.
Step 2: Pay less for compute you already need
The biggest single lever is almost always compute pricing. Three stacked moves:
| Lever | Typical saving | Trade-off |
|---|---|---|
| Graviton (arm64) | ~20% better price/perf | rebuild/test on arm64 |
| Compute Savings Plans (1-yr) | ~30-50% vs on-demand | commit to $/hr spend |
| Spot (fault-tolerant work) | up to ~70-90% | can be interrupted |
Compute Savings Plans are flexible across instance family, size, and Region, so they are far safer than legacy Reserved Instances. Cover your steady baseline with a Savings Plan, run interruptible and batch work on Spot, and leave only the spiky remainder on on-demand. Right-size first, though, buying a commitment for an oversized fleet just locks in the waste.
Step 3: Stop paying for idle and forgotten resources
This is the unglamorous money. A quick CLI sweep finds most of it:
# Unattached EBS volumes still billing every month
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[].{ID:VolumeId,GiB:Size,AZ:AvailabilityZone}' \
--output table
# Old EBS snapshots nobody will ever restore
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[?StartTime<=`2025-01-01`].[SnapshotId,VolumeSize,StartTime]' \
--output table
# Idle (unassociated) Elastic IPs cost money while detached
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].PublicIp' --output text
Add the usual suspects: dev environments left running over the weekend, over-provisioned RDS instances, and load balancers with zero targets.
Step 4: Make S3 storage tier itself
Storage is a slow leak. The fix is mostly automatic: turn on S3 Intelligent-Tiering so objects move to cheaper tiers as access cools, with no retrieval fees for the frequent/infrequent tiers. Then add lifecycle rules to expire old versions, clean up incomplete multipart uploads (a classic invisible cost), and archive cold data to Glacier classes.
The cheapest gigabyte is the one you delete. The second cheapest is the one that tiered itself to Glacier without you thinking about it.
Step 5: Mind the data transfer and the long tail
Revisit NAT Gateway processing fees, cross-AZ chatter, and CloudWatch Logs retention (default infinite retention silently accrues). None of these are the headline number, but together they are often 10-15% of a bill. Set log retention, add S3/DynamoDB gateway endpoints, and review per-Region spend for resources someone left in us-west-2 after a test.
Takeaways
- Start with the CUR, tagging, and anomaly alerts, attribution before action, and guardrails so savings stick.
- Right-size, then layer Graviton, Compute Savings Plans for baseline, and Spot for interruptible work.
- Sweep idle resources: unattached volumes, stale snapshots, detached EIPs, weekend dev environments.
- Automate S3 tiering and lifecycle expiry, and clean up the transfer/logging long tail.