Rotating secrets with AWS Secrets Manager
Automatic rotation for database credentials and API keys, with zero downtime.
The first time I inherited a system with hard-coded database passwords in environment variables, I found the same credential pasted in four places: a Terraform variable, a Lambda config, a CI secret, and, naturally, a Slack message from 2021. Rotating it meant touching all four without breaking anything. We didn't rotate it for two years.
AWS Secrets Manager exists precisely to kill that pattern. But the part people get wrong is rotation: dropping a secret into Secrets Manager doesn't rotate it. You have to understand the four-step rotation contract, and that's what trips most teams up.
What rotation actually does
Secrets Manager rotation is driven by a Lambda function that Secrets Manager invokes on a schedule, passing a Step in the event. Your function must implement four discrete steps, each idempotent:
createSecret, generate a new credential and store it as theAWSPENDINGversion.setSecret, apply the pending credential to the actual resource (e.g.ALTER USER ... PASSWORD).testSecret, verify the new credential works by actually using it.finishSecret, move theAWSCURRENTlabel to the pending version.
The two-version model (AWSCURRENT and AWSPENDING) is the whole trick: the old credential keeps working until finishSecret flips the label, so there's no window where consumers are locked out.
The single-user vs two-user trade-off
For RDS and a few other databases, AWS ships managed rotation Lambdas with two strategies, and choosing wrong causes outages:
| Strategy | How it works | Watch out |
|---|---|---|
| Single user | Rotates the password of one user in place | Brief window where open connections may hold the old password; fine for low-churn clients with retries |
| Alternating users | Maintains two users, rotates the inactive one, then switches | Zero-downtime, but needs a superuser/clone grant and double the users to manage |
If your app pools connections and can't tolerate a single failed auth, use the alternating-users strategy. The single-user one is simpler but assumes your clients reconnect cleanly.
Wiring it up with the CLI
For an RDS Postgres credential, you create the secret, then attach a rotation schedule pointing at the managed Lambda. The RotationRules support a cron-like ScheduleExpression as well as a simple day count:
aws secretsmanager create-secret \
--name prod/checkout/db \
--secret-string '{"username":"app","password":"REPLACE_ME","host":"checkout.abc123.us-east-1.rds.amazonaws.com","port":5432,"dbname":"checkout"}'
aws secretsmanager rotate-secret \
--secret-id prod/checkout/db \
--rotation-lambda-arn arn:aws:lambda:us-east-1:111122223333:function:SecretsManagerRDSPostgreSQLRotationSingleUser \
--rotation-rules '{"ScheduleExpression":"rate(30 days)"}'
That rotate-secret call triggers an immediate rotation as well as setting the recurring schedule, so it doubles as your verification that the whole pipeline works end to end.
Retrieving it from code, and caching
The mistake I see most: calling get_secret_value on every request. That adds latency and can hit API throttling. Use the AWS-provided caching library so you fetch once and refresh in the background:
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
import boto3, json
client = boto3.client("secretsmanager")
cache = SecretCache(
config=SecretCacheConfig(secret_refresh_interval=300),
client=client,
)
def get_db_creds():
raw = cache.get_secret_string("prod/checkout/db")
return json.loads(raw)
The cache transparently picks up the new AWSCURRENT version on its next refresh, so rotation and the application stay decoupled. Just make sure the refresh interval is shorter than any connection's lifetime so reconnects grab fresh credentials.
Locking down access
Rotation only helps if the blast radius is small. A few non-negotiables I enforce:
- Encrypt with a customer-managed KMS key, not the default
aws/secretsmanagerkey, so you control the key policy and audit decrypts separately. - Scope IAM to specific secret ARNs with a path prefix like
prod/checkout/*, neversecretsmanager:GetSecretValueon*. - Turn on CloudTrail data events for Secrets Manager so every
GetSecretValueis logged. - Use VPC endpoints (
com.amazonaws.region.secretsmanager) so retrieval never leaves the AWS network.
Takeaways
- Storing a secret isn't rotating it, rotation is a four-step Lambda contract (create, set, test, finish) built on the AWSCURRENT/AWSPENDING version labels.
- Pick single-user rotation for simplicity, alternating-users for true zero-downtime with pooled connections.
- Cache retrieved secrets with the official caching library; never call GetSecretValue per request.
- Shrink blast radius with customer-managed KMS keys, ARN-scoped IAM, CloudTrail data events, and VPC endpoints.