AWS is a great platform for managing data. However, if you’re not careful, you can end up spending a lot of money on data storage and management. Here are some tips to help reduce your AWS data usage:
- Use deduplication tools to reduce the number of files in your AWS account. This will help you save on storage costs and improve performance.
- Use S3 for large files instead of Amazon Glacier or HDFS. These platforms offer better performance and are more affordable than Amazon Glacier or HDFS.
- Use IAM policies to restrict access to certain folders and files in your AWS account. This will help keep your data safe and secure while it’s stored in AWS.
Use AWS’s CloudFront CDN
CloudFront is a Content Delivery Network, or CDN, a service that sits in front of your website, API, or any other web service. It caches the result near the edge (close to the user), which improves performance, but it also can reduce the costs you pay in a number of ways.
First off, CloudFront has a much larger free tier, and it’s always free as well. You get 1 TB of data, an entire terabyte, for free each month. AWS’s normal free tier is 100 GB, so for people using more than 1 TB, this change alone will save you $80 every month.
For larger businesses spending way more than that, CloudFront is also priced aggressively per GB of data. If you compare EC2’s pricing to CloudFront’s pricing, you’ll see CloudFront offers huge savings over standard EC2 pricing. AWS’s data pricing is tiered, so for the first 50 TB, you’ll save 6%, then 15%, then up to 50% off if you’re reaching petabyte-levels of data.
Finally, CloudFront can also save you money in another way; by reducing stress on the origin servers with edge caching, you’ll end up needing less origin servers. With a CDN, CloudFront’s servers will take the brunt of the hit, meaning you can spend less on compute each month.
It’s important to note that CloudFront isn’t just for websites—it can be used for caching any kind of web requests, such as high-traffic GET requests to an API. Even if it’s not caching, you can still put it in front of your API to serve data through it, saving you money in the process.
Use AWS Lightsail
In an effort to compete with simpler hosting providers aimed at regular people, like Digital Ocean, AWS created Lightsail, which offers simple pricing for simple compute instances, databases, and networking. These instances are designed for running basic hosting software like WordPress and NGINX, and are easy to set up with pre-made templates.
The important part is that each package comes with fixed-rate bandwidth at an extremely cheap rate. The $5 instance, for example, comes with 2 TB of bandwidth—which costs $85 on CloudFront, including the free tier’s 1 TB.
This is great, but it comes with a catch:
What this means in practice is that Lightsail should operate in its own VPC, and you shouldn’t connect external services like EC2 or S3 to it with the intention of proxying it to save money. This clause is vague, so it’s not clear if AWS allows, for example, serving image processing on S3 objects from a Lightsail instance, but if you’re using 100% Lightsail, you should be fine.
Offload To External Services
Sometimes, there are just services that can offer a better deal, and the solution is just to not use AWS for the things that are costing you the most money. You’re generally not locked-in to any particular cloud vendor, and there are many ways using multiple services, or “multi-cloud,” can be beneficial.
One thing you’ll want to watch out for is transferring tons of data between clouds. For example, data transferred from AWS to Google Cloud Platform or Azure will count towards your data bill, because it’s still being transferred out from AWS over the open internet. If you’re not careful, multi-cloud can end up costing more money.
For example, AWS S3 can be expensive. You’d think the main cost would be data storage, but if you’re serving content from it, you’re also paying for data, and you’re also paying for each type of request. For high traffic content, this can easily be hundreds of dollars a month while you pay next to nothing for “cheap storage.”
One solution to this is to swap to another S3 compatible service. S3 has an API definition, and other services can implement it, like Digital Ocean Spaces. Spaces is a barebones implementation, but it’s reliable and much cheaper than S3 for data costs. You can even self-host S3 from your own servers.
RELATED: Should You Use an S3 Alternative For Object Storage?
Offload to Dedicated Servers with Fixed Bandwidth
With cloud services offering the ability to create and destroy hundreds of virtual machines at will, it’s easy to forget that the old school solution exists—buy a bare metal server in a datacenter.
Many companies will offer dedicated servers that don’t nickel and dime you for data usage. OVH, the third largest hosting provider in the world, sells machines that come with dedicated 500Mbps connections to the open internet.
It’s not fancy, and it’s not the best practice solution, but if you want to save money, it’s always an option. You’ll still want to make sure that you’re not transferring lots of data out from AWS to another server.
Have On-Prem Hardware? Use AWS Direct Connect
A common problem for large companies is making the migration from on-premises hardware to cloud services. It’s sometimes not even beneficial to migrate everything you might run on-prem, so you usually end up with a hybrid solution using cloud hardware for the things that save the most money.
However, this can cost you money if you’re transferring data back and forth between AWS and on-prem, especially considering this charge isn’t present if you’re entirely using one or the other.
AWS has a solution for this called AWS Direct Connect, which is an enterprise-grade connection directly to AWS. It still charges for data, but at $0.02 per GB, it’s much less than standard pricing. It also offers dedicated bandwidth up to 100 Gbps.
Direct Connect isn’t just some service you enable though—it requires an actual direct physical connection. This can be arranged in a couple ways: colocate at an AWS Direct Connect datacenter, work with an AWS Partner to set up a connection to your datacenter, or purchase a Physical direct Connect node.
Play Video
Either way, this option is specifically for large companies, and doesn’t make economic sense unless you have a lot of on-premises hardware and are transferring tons of data.
The Obvious Solution: Decrease Your Data Usage
Of course, you can always lower your data usage by optimizing the size of what you send. Using Gzip and deflate compression is important, as is compressing web content served from your servers. Any requests that come out of AWS cost you money, so minimizing these requests should be a priority.
For example, if you’re serving images from S3, you probably want to make sure they’re as optimized as possible. One of the benefits of AWS is easy automation, and it’s quite simple to set up automatic image processing using Lambda Functions. This can easily halve the size of your images.
RELATED: How to Automatically Compress Images in S3 with Lambda
Whatever the case, you’ll want to take a look at your network architecture and see if there’s any way you can serve the same service using less data.