Machine-Learning Based Rightsizing: Is it worth it?

August 01, 2022

Back in March, we shared an exciting story of creating a POC for a machine-learning-based infrastructure rightsizing mechanism.

Naturally, such an interesting initiative and promising results could not be put aside, and we went on working on the tool to see how far we could potentially get, and – of course – if all our work is worth the benefits it could bring.

From POC to a Product

The initial, POC, version of Maestro Cost Advisor (that’s how we called it) was actually quite a simple one in terms of the functionality: it took the virtual machines performance metrics for 4 days, analyzed the CPU and memory load, the timelines, and suggested the following actions to the instances:

Scale up
Scale down
Shutdown
Schedule

The mechanism analysed the real load on the instances and suggested new instance types based on the 90’s percentile for each parameter.

The approach was good enough to prove that the mechanism would work, but definitely not enough to become a business tool.

Thus, we had a brainstorming session to suggest ideas about how to turn our POC into an enterprise-ready product. Based on our experience in working with cloud, as well as on the dialogs with our customers, we highlighted 13 possible features that could be added within a reasonable time. And out of them, we highlighted 4 that could be used to create a Minimum Viable Product (MVP):

Planning a Product

When selecting features for an MVP, we focused on the main tasks, which definitely needed to be resolved for our mechanism to become a real value for any business.

· Enabling API. API-first approach enables quick configuration of the application, as well as gives the possibility to quickly integrate it with existing own or third-party systems to bring more value to the customer.

· Standalone model. Maestro Cost Advisor POC was built based on AWS Sagemaker, as it was the fastest way for the team to implement the ML. Even though the tool can potentially work with data from any cloud (and uses Azure in the POC), having its “brain” in AWS could be a significant drawback – for both those who just don’t want to use AWS, and those who want to have all mechanisms within their existing infrastructures.
Thus, we need to rebuild our tool in such a way that it could be deployed on any customer-defined platform, either Cloud or on-premise.

· Workload-specific recommendations. Any enterprise has multiple environments and workload types. Production and testing, databases and load balancers – each instance has its own tasks and performance specifics. Naturally, the recommendations should take this all into account to bring a real value to the business.

Is it worth it?

Putting effort into creation of a minimum viable product is a serious decision. It should be balanced carefully considering the effort needed for implementation and the possible benefits.

We decided to check this by calculating the possible Return of Investment (ROI), based on a large enterprise infrastructure where we would be able to try our product, once developed, in action.

Within the calculation, we took into account the time needed for finalizing the ML-based recommendation mechanism and integrating in with Maestro (3 months), Maestro licence (based on the covered infrastructure cost), and the estimated cost of a large-scale enterprise infrastructure.

The result looks more than promising:

Still, at the moment, the mechanism is considered to be a future part of Maestro, although the implementation is not a top priority task at the moment.

This means, that once the development is over, the price of introducing the machine learning to the FinOps processes of Maestro customers will be even smaller – and the ROI will grow tremendously.

Search This Blog

Maestro Blog