Reduce AWS Bill: Auto-Delete Snapshots with Python Lambda Script
When old snapshots pile up, so do costs – try these slick tricks to trim the bloat
By Leif Hanson, Full Stack Developer
So the billing department calls to inquire about eXtra eXpenses from AWS due to 14,000 32GB snapshots having piled up.
Unbeknownst to you, some backup script has gotten away from your team and started backing up repeatedly. Maybe you’re intentionally keeping around petabytes of backup data? Or maybe you need a pruning script?
Even worse, when you try to select and delete 50 snapshots at a time in the AWS Console manually, you get random cryptic warnings and failures to delete.
When this happened to me, I considered calling AWS support at this point before eXhaling through my nose heartily at the indignation and deciding to do it myself. Why not use the power of AWS Lambda to work around one of Amazon Web Service’s own bugs?
Well, script kiddies, grab a coffee and dust off your Python skills because, like Leeroy Jenkins, we’re going in!
“Oh no, Leeroy, STOP!!!” one of your edge devs yells. Why? Because like most things AWS, we can’t just go around writing Python scripts all willy-nilly. First, we need an IAM user role and some permission policies.
“Jeez Leeroy, you’re gonna get us killed here,” mumbles upper management.
How to Delete Snapshots to Reduce AWS Bill
- Head to IAM -> Roles and make yourself a new role.
- Add the AWSLambdaBasicExecutionRole and AWSXRayDaemonWriteAccess permission policies as usual for a Lambda role.
- Because we’re working with EC2, add the AmazonEC2FullAccess permission policy. There may be some lesser access level worth investigating if your company’s policies are tight surrounding IAM permissions, but I’ll leave that to a future episode.
For now, these three Permission policies should be enough to run the Lambda script and your IAM Role summary should look about like so:
With our IAM Role set, we can catch back up to Leeroy over in AWS Lambda.
4. Create a standard Python 3.7 script. Attach the IAM Role. For your “Test” scenario just choose the “Hello World” settings; they are irrelevant to this script as it will be returning void.
5. In lambda_function.py add the following script:
import boto3 import datetime import getopt ############################################################ # leifCleanAwsEc2Snapshots # Script will delete all snapshots created before dateLimit. # ALL SNAPSHOTS OLDER THAN THIS DATE WILL BE DELETED!!! dateLimit = datetime.datetime(2018, 1, 1) # yyyy, mm, dd ############################################################ #AWS Settings client = boto3.client('ec2',region_name='us-east-1') snapshots = client.describe_snapshots(OwnerIds=['1111111111111']) def lambda_handler(event, context): # Calculate the number of days ago the date limit is. dateToday = datetime.datetime.now() dateDiff=dateToday-dateLimit # Could base this clean-up on the number of snapshots too. #snapshotCount=len(snapshots['Snapshots'])for snapshot in snapshots['Snapshots']: a=snapshot['StartTime'] b=a.date() c=datetime.datetime.now().date() d=c-b try: if d.days>dateDiff.days: id = snapshot['SnapshotId'] started = snapshot['StartTime'] print(id + "********************") print(started) #Uncomment below line for "live run" #client.delete_snapshot(SnapshotId=id) print("DELETED^^^^^^^^^^^^^^^^^^") except getopt.GetoptError as e: if 'InvalidSnapshot.InUse' in e.message: print("skipping this snapshot") continue
6. Once you’ve saved the script, you could click the “Test” button to run it, as Leeroy is over in the corner currently doing. However, it’s not going to run yet.
7. Enter your AWS details for Region and OwnerID. In my case, the OwnerID was our main IAM Role, but your mileage may vary depending on your EC2 setup. Anyway, find the owner IAM of those snapshots and enter that for the OwnerID.
No, Leeroy, not yet, stop pressing that Test button.
8. We still need to set our date. The core purpose of this script is to delete all EC2 snapshots before a certain date, so we have to make sure we set the “dateLimit” variable to whichever date we want. I recommend starting long ago, and working your way forward.
Now, since we’ve got our dateLimit set, and we’ve got the right AWS Region and OwnerID, and because Leeroy is still hammering that Test button over there, our script finally produces a “dry run.”
What you should eXpect to see is a list of snapshot names (and the dates they were created) in the console. Should look about like so:
leifCleanAwsEc2Snapshots July 28, 2020 Run 1: Set date limit: 1-1-2019 Perform dry run. OK Set execution time: 5 minutes SUCCESS: Removed 1308 snapshots. Log: snap-073d12f68267c9178******************** 2018-07-08 15:36:21+00:00 DELETED^^^^^^^^^^^^^^^^^^ snap-0bfef6b0debd78d6a******************** 2018-07-08 15:37:21+00:00 DELETED^^^^^^^^^^^^^^^^^^ snap-0432066f851a8b197******************** 2018-07-08 15:36:21+00:00 DELETED^^^^^^^^^^^^^^^^^^ snap-00b67e45a0a84ec32******************** 2018-07-08 14:02:08+00:00 DELETED^^^^^^^^^^^^^^^^^^ snap-0554604daa7202a4b******************** 2018-07-07 15:37:21+00:00 DELETED^^^^^^^^^^^^^^^^^^ snap-07b7d235e82414dda******************** 2018-07-07 15:36:21+00:00 DELETED^^^^^^^^^^^^^^^^^^ Number Deleted: 19
9. While Leeroy celebrates his victory, you head back to EC2 and check and note that no snapshots have been deleted. That’s because in order to run this script in a “live run” you must uncomment the line:
10. Assuming the above console output does list only snapshots you want deleted, and assuming Leeroy hasn’t broken anything, you can uncomment that line, save, and then walk over to the corner and slap Leeroy’s hand before finally pressing the “Test” button like the responsible Dev Ops engineer you are.
Depending how many snapshots meeting the delete criteria are found, this script can take a long time to run. I estimate about 300 – 350 snapshots can be deleted every 5-minute run. Using the 15-minute max Lambda eXecution time, you can wipe out about 1,000 snapshots in a single run.
Conclusion: This script could be easily adapted to be fired when an EC2 or other kind of snapshot are created, and in such a way, one could tie their pruning script to the creation of new snapshots automatically (if one were so inclined).
Disclaimer: Use at your own risk. Once the “live run” line above has been uncommented, this script can and will delete hundreds of snapshots without further warning. With great power comes great responsibility and I recommend you play with the script in “dry run” mode without uncommenting that line for several Tests to understand eXactly how the script will work before running a live run. Indeed, the word “Test” on the button to start this script is a misnomer here since we’re hand-firing a script that will do much more than simply “Test.” It will lay waste to potentially petabytes of snapshot data.
Avoid Snapshot Pileups with Amazon Data Lifecycle Manager
Unlike Leeroy, you may prefer to engage in meaningful planning and strategizing so those snapshots don’t pile up again. One tool for this task is the Amazon Data Lifecycle Manager, which allows users to configure settings to automate the creation, retention and deletion of snapshots. Using the Data Lifecycle Manager you can create a policy that stipulates eXactly when snapshots are taken and how long they are retained before deletion.
Amazon Data Lifecycle Manager: The Basics
- Step 1 – Open the Data Lifecycle Manager. Look in the Amazon EC2 console, under Elastic Block Store.
- Step 2 – Tags. Create and assign tags to the EBS volumes (instances) slated for backup since lifecycle policies identify which instances to snapshot based on tags.
- Step 3 – Create Snapshot Lifecycle Policy. Choose how often to run the policy (such as every 2 hours), select the starting time (such as 0700 UTC), and retention type (by number or age, such as retain only the 5 most recent snapshots).
- Step 4 – Enable policy.
Sounds simple enough, but beware that you could still end up with way too many images being stored, depending on how the policy is configured. After setup, you’ll want to monitor your snapshots to ensure they are being created and deleted as planned.
If you need help setting up the Data Lifecycle Manager or pruning AWS snapshots, we’re here for you. Or maybe you need some other tweaking with your AWS Console or website configuration?
Check out our website tech support & maintenance services for more information.
To read more web tips and tricks from our web app developers, check out eX-Cetera, our blog where we offer WordPress Tips and Tricks and Web Tips and Tricks.
**This article is provided for free and as-is; use, enjoy, learn, and eXperiment at your own risk – but have fun! eXcelisys does not offer free support or free assistance with any of the contents of this blog post. If you would like assistance, please contact one of our Solutions Services Consultants today.
Hi Leif. I keep getting the following error:
[ERROR] NameError: name ‘dateDiff’ is not defined
Traceback (most recent call last):
File “/var/lang/lib/python3.7/imp.py”, line 234, in load_module
return load_source(name, filename, file)
File “/var/lang/lib/python3.7/imp.py”, line 171, in load_source
module = _load(spec)
File “”, line 696, in _load
File “”, line 677, in _load_unlocked
File “”, line 728, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/var/task/lambda_function.py”, line 30, in
I am not seeing the issue in the script and I keep banging my head against the desk. Any recommendations?
Try changing line 9 to:
dateLimit = datetime.datetime(2022, 01, 01) # yyyy, mm, dd
Or to whatever date you want, but make sure you use leading zeros for the month and day. I think those leading zeros may have been stripped out in a copy and paste somewhere between my source code and this blog post.
Disclaimer: If / When this works, the successful behavior will attempt to delete any AWS snapshots for the date range you define. Extreme caution is advised. Usually best to start with small date ranges and get it working before you start deleting large chunks of your AWS snapshot history.