The 2025 guide to AWS cost optimization
A structured walk through every major lever, from compute commitments to storage tiering.
I've run AWS cost reviews for long enough to have a strong opinion about where the money actually hides, and it has shifted. A few years ago the easy wins were Reserved Instances and deleting unattached EBS volumes. In 2025 the low-hanging fruit is gone for most mature accounts; the savings now live in commitment strategy, Graviton, storage tiering, and, increasingly, the cost of whatever you're doing with GPUs and foundation models.
This is the checklist I work through, roughly in order of effort-to-reward.
1. Get visibility before you touch anything
You cannot optimize what you can't attribute. Before any cutting, enforce a tagging policy (env, team, service, cost-center) with AWS Organizations tag policies, and turn on cost allocation tags. Then move your detailed billing data into CUR 2.0 and query it in Athena. The single most useful query I run finds spend that no team owns:
SELECT line_item_product_code,
SUM(line_item_unblended_cost) AS cost
FROM cur2
WHERE line_item_usage_start_date >= DATE '2025-06-01'
AND (resource_tags['user_team'] IS NULL
OR resource_tags['user_team'] = '')
GROUP BY line_item_product_code
ORDER BY cost DESC;
Untagged spend is almost always where the waste is, because nobody feels responsible for it.
2. Commitments: Savings Plans over RIs, mostly
For compute, Compute Savings Plans have largely replaced Reserved Instances because they apply across instance family, size, region, and even between EC2, Fargate, and Lambda. The trade-off is the classic one:
| Commitment | Discount | Flexibility |
|---|---|---|
| On-Demand | 0% | Total |
| Compute Savings Plan | up to ~66% | High (any family/region/service) |
| EC2 Instance Savings Plan | up to ~72% | Low (locked to family + region) |
| Spot | up to ~90% | Can be reclaimed in 2 min |
The mistake I see most is over-committing. Buy a Savings Plan to cover your stable baseline only, the floor of usage you're confident persists for the term. Cover the variable layer with on-demand and Spot. A 1-year, no-upfront Compute Savings Plan at the baseline is the highest-ROI, lowest-regret move for most teams.
3. Graviton, almost everywhere
Moving x86 workloads to Graviton (ARM) instances typically delivers 20-40% better price-performance, and in 2025 the ecosystem friction is largely gone, RDS, ElastiCache, Lambda, ECS/EKS, and OpenSearch all support it. For interpreted-language services it's often a flag change. The catch is native dependencies that aren't built for ARM, so validate in a canary before flipping production.
4. Storage tiering you should just turn on
- S3 Intelligent-Tiering, for any bucket with unpredictable access, set the default storage class to Intelligent-Tiering and let AWS move objects between tiers automatically. It's close to free insurance against paying Standard rates for cold data.
- gp3 over gp2, gp3 EBS is roughly 20% cheaper than gp2 for the same baseline and lets you provision IOPS independently. There is rarely a reason to still be on gp2.
- Delete what's orphaned, unattached EBS volumes, old snapshots, idle load balancers, and unassociated Elastic IPs. Boring, recurring, real money.
5. The new line item: GPU and inference spend
The cost category that didn't exist on my checklist a few years ago is the dominant variable cost on many accounts now. If you're running models, the levers are: prefer managed inference (Bedrock) over an always-on GPU endpoint for spiky traffic, use SageMaker Serverless or async inference to avoid paying for idle GPUs, right-size to the smallest accelerator that meets latency, and cache aggressively. An always-on p4d or p5 instance left running over a weekend can dwarf the savings from everything else on this list.
Cost optimization is not a project you finish; it's a control loop you run. The teams that stay efficient have a monthly review and an owner, not a one-time cleanup sprint.
6. Let the tools do the finding
Turn on AWS Compute Optimizer for right-sizing recommendations, Cost Anomaly Detection for surprise spikes, and Trusted Advisor's cost checks. None of these act on their own, but they surface candidates so your review is data-driven instead of vibes-driven.
Takeaways
- Visibility first: enforce tagging and query CUR 2.0 in Athena, untagged spend is where the waste hides.
- Cover your stable baseline with a no-upfront Compute Savings Plan; don't over-commit, and use Spot for the variable layer.
- Graviton and gp3/Intelligent-Tiering are near-free price-performance wins worth turning on broadly.
- GPU/inference spend is the new dominant variable cost, kill idle accelerators and prefer managed or serverless inference for spiky traffic.