One benefit of public cloud infrastructure such as Amazon Web Services (AWS) is that it allows startups and established companies to quickly set up and start using resources. There are also a wide range of server types for many of those resources that require computing power, such as virtual servers, managed databases or caching. Some types offer greater computing power, some offer more memory, and some offer more network bandwidth. Both of these features can also be a mixed blessing when it comes to managing cloud costs. It’s easy to turn on instances and forget about them, or else choose the wrong instance size and be left with unused excess capacity. As usage scales up, these overlooked costs can add up, sometimes amounting up to 10% of the total bill.
Where do I begin?
Unused resources are one possible source for cost savings. Another place to start cost cutting is one that AWS itself suggests: savings plans and reserved instances. Reserved instances are a way to purchase compute capacity at a discount for a specific type of instance. The narrower the scope of the instance type and location, the more discount that AWS offers. Savings plans work slightly differently. A savings plan is a minimum per-hour spending commitment by the AWS customer for a fixed period of time in exchange for compute capacity at a discounted rate. The discount offered by RIs and SPs increases with a 3-year commitment instead of 1 year. There is a further discount if the cost is fully paid up front rather than partially or spread out monthly. Amazon even has tools that suggests how many RIS or hours of SPs to purchase and shows you the potential savings in various scenarios.
The paradox with AWS RIs and SPs is that if you start with the obvious option and purchase either one, you are locked in to paying a potentially higher cost for capacity that you don’t need. Let’s say the RI cuts the cost by 30%, but reducing the instance size would cut the cost by 50%. You would end up paying 40% more for that instance with the RI than if you had just changed the instance type. On the other hand, the deeper analysis and extra effort required to find additional cost savings may delay any action to the point where an immediate commitment via SP or RI is still more economical.
Looking beyond the obvious
There are many other, less obvious ways (compared to unused capacity or SPs and RIs), to reduce AWS costs. It requires digging into specific services and the way they are architected, which sometimes has nothing to do with AWS itself. Take, for example, Redis, a popular key-value storage database that is used for caching, queuing and more. The main data handling process of the Redis server, the one that saves incoming data and sends out data when requested, is single-threaded. So, if it starts to slow down as the load on the server increases, adding more processors will not make it faster. I have seen a 48 vCPU instance used for Redis, where at least 40 vCPUs sat idle no matter what the system load was. Changing the instance type here to one with fewer vCPUs would save thousands of dollars a month.
Amazon Relational Database Service (RDS) is another area to explore for potential savings. Over time, the back-end software that uses databases increases the load on databases with duplicated or inefficient queries, especially if the code uses object-relational mapping (ORM) tools. Costs here can arise from too-large instance sizes or disk usage, measured as input-output operations (iOPs), that goes above the included baseline amount. A sub-optimal configuration of read replicas can also add to unnecessary costs.
Running to keep up
One overlooked source of savings is in fact often regarded as a nuisance task and a cost with now benefit, and that is keeping up with the latest versions of both software and hardware. For example, the database engine used by MySQL v8.0 includes many efficiencies compared to v5.5 or v5.6. Yet if the business’s applications appear to be running just fine, it will want its engineering team to focus on building new functionality, and not on an effort that seemingly has no customer-facing benefit.
Take Redis versions as another example. The basic GET and PUT functions have not changed since the earliest versions. If that is the only way you are using Redis, it is posible to keep running v3.2.6 which was released on October 26, 2016. However, there are hidden benefits to upgrading beyond v3.2.6, such as in more efficient algorithms when it comes to replication, that can translate to lesser compute, memory and network requirements. These changes can also be beneficial in an outage situation, reducing time to recovery (TTR) and impact on customers.
Likewise, keeping up with newer hardware can also save costs. While much of AWS usage is with virtual instances, the hardware can matter. According to AWS a newer Graviton processor on the host hardware can increase compute efficiency by 40%. That could translate to 40% fewer instances in a Kubernetes cluster and a comparable saving in cost. If your database is IO heavy and you have dedicated iOPS using i2 storage, switching to gp3 could result in a 60% cost saving.
Most businesses will make cost control a priority at any time, and a periodic review, at least once a year, can help find savings that can be invested in other parts of the organization. These are just some ways to cut AWS costs. AWS has many services and any number of them could be bloating the bill. If an organization doesn’t have the time or people to explore the options, bringing in an external consultant could be an option. Depending on the amount saved, it might be well worth the fees.